• Content Count

  • Joined

  • Last visited

Community Reputation

0 Neutral

About ericyifeng

  • Rank
  1. ericyifeng

    Seek Times

    OK, Gilbo, so in the tests, the hardware disk cache is on, but the OS cache is bypassed, is that right? Looking at the "Percentage Time Spent vs Seek Distance" graph and the comments below it: "Again, just eyeballing results, the aformentioned 11% of requests that have to stride 8 gigabytes or more occupy about 25% of the disk's time. The remainder is either sequential, buffered by the drive, or a minimal distance away from the previous request." I think I still fail to understand why the seeks between 512 and 16M sector would be "either sequential, buffered by the drive, or a minimal distance away from the previous request". Buffered? By whom? The hardware disk cache? You just said the hardware disk cache is already on. Then why are we seeing considerable time spent on these seeks if they are buffered? Would they be buffered by the OS disk cache? Maybe some of them, but not all. So why should we ignore these seeks (while they represent another 50% of the total time) when calculating the potential improvement? I hope I've made my point clear and please answer my question directly.
  2. ericyifeng

    Seek Times

    I don't think you've adequately gone over the methodology ericyifeng. If a request in that chart is being served by the cache because the read-ahead algorithm caught it, it is the disk cache not the OS cache that acted. The disk cache acts independently of the OS. Your objections do not relate to the SR methodology. The OS cache is not involved in caching IPeak trace replays. So every cache hit Eugene notes is going to occur under this workload on any OS. IPeak isolates the disk subsystem, which is why it is such a valuable tool. You need to make sure you understand a methodology before you declare it flawed. I was not saying the methodology is flawed. I was just questioning the analysis which calculates the potential improvement on 8GB+ (or 16M+ sectors) seeks only and draws the conclusion that the improvement would be insignificant. OK, let's talk about the hardware disk cache. Are you saying that seeks between, say, 1K and 16M sectors can be absorbed by the disk cache? I don't know the disk cache internals. However, it's hard for me to imagine that the disk cache would do such large read-aheads (1K - 16M sectors or 0.5MB - 8GB), especially when the disk is actually busy. BTW: is the hardware disk cache on or off in the IPEAK SPT tests? If it's on, then why are we seeing considerable time spent on seeks between 512 and 16M sectors if they should be cached? If it's indeed off, well, that's good to know and can we also see the times with the cache on? Did you take into account rotational latency? A hard disk drive isn't RAM. Just because there's latency doesn't mean there's a seek. According to the data, you did not. I'm sorry if this sounds harsh, but judging from the blatant flaws in your critques, you honestly need to sit back and learn a little more before you criticize the work of someone who has been doing this for years. And before you redo your math to include rotational latency, and you tell me that I'm wrong. Make sure you do it with worse case latency, not average latency. I hope this clarifies the nature of the issues that you raised. They are nothing more than spectres that research would have dismissed. The graphs clearly state "Read Seek Profile" so I naturally ignored rotational latency. So what are the graphs really meant to show? Seek latency? Rotational latency? Or both? BTW: I'm an infrequent visitor to the SR website so I'm not familar with the test methodology used here. I stumbled upon this discussion from the main page link and I'm just saying what I feel might be wrong. After all I guess the forum is meant to stimulate discussion rather than follow what the authoritative say. I apologize if my words are offensive.
  3. ericyifeng

    Seek Times

    I strongly agree with e_dawg. There are probably wrong assumptions made in Eugene's analysis: "11% of requests stride more than 8 gigabytes... 80%, however, are within 8 megabytes. Given a decent read-ahead buffer algorithm, a large majority of those requests will be cached outright while the remainder will probably be on the same track (no seek) or at most a few tracks away (track-track seeking)." For example, the Linux read-ahead algorithm only permits up to 128KB of read-ahead on a per-file basis, given the access pattern is sequential. If the access is detected non-sequential, read-ahead will be even minimized. So, at least in Linux, it's not the case that data as far as 8MB apart can be read ahead and thus cached at the same time. I'm not sure about the Windows algorithm, but Linux is definitely a decent OS. More over, looking at the "Read Seek Profile" graphs of Hitachi Deskstar 7K400 and Fujitsu MAU3147, we see that the seek penalty quickly kicks in at distances of as short as 1K sectors (0.5MB) - which is very roughly the track size. The graphs actually also show that seeking to just a few tracks away is almost as bad as seeking to, say, 512K sectors (256MB) away. So, looking at seek distances of 16M+ sectors only to calculate the potential benefit of better seek performance is flawed. We probably should take all seek distances of 1K+ sectors into consideration.