Guest Eugene

And now... High-End DriveMark 2006 Results

Recommended Posts

BTW udaman, I would be quite sceptical about any review site which claimed the 7200.8 had the fastest seektime at 8.0ms..... Any review site which actually did their job properly should know the Seagate doesn't actually have the fastest seektime.....

Share this post


Link to post
Share on other sites

Any word on server benchmarks? Your average queue lengths are so short that I'm sure the desktop storagemark is totally unrelated to my database and fileserver workloads.

Share this post


Link to post
Share on other sites
The exact order the requests are recorded in is the exact order in which they are issued.  Every thread's, every task's requests are recorded as they are issued, and repeated in that order --they are intermingled exactly as they were in the original recording.  CQ will rearrange these requests at the drive.  CQ does not affect the order in which request are issued, but the order in which they are satisfied.

212590[/snapback]

But in 'the real world', if a response arrives earlier, a dependent request may also be send earlier.

212593[/snapback]

Olaf,

thanks for picking up on that. I tried to get this point across on an earlier thread, without success.

With CQ, threads that access data from cache should be able to request their next access much sooner (which may in turn also be satisfied from the cache). The benchmark inserts artificial delays on that "thread", reducing the overall score.

Desktop-orientated drives do read aheads for exactly this scenario. It is even conceivable that the drive may use a different read-ahead pattern for "throttled sequential access" (eg DVD writing) instead of "full speed sequential access" (copying a big file to another disc).

No wonder CQ drives suffer.

cheers, Martin

Share this post


Link to post
Share on other sites
thanks for picking up on that. I tried to get this point across on an earlier thread, without success.
I believe Martin is referring to comments on this page. Martin's point is subtle, and perhaps he struggled in explaining it, but, nonetheless, I believe he is correct.

Reviewing the TB3 audit article, we observe in the diagram the relative capturing and playback initiation points of WinTrace and Rankdisk respectfully. Further, as Eugene writes,

WinTrace32 is a background, resident program that intercepts all requests sent to a host adapter's driver. For each given intercepted request, a starting timestamp, starting sector #, length of the request in sectors, and an associated OP code (Read, Write, Done) is appended. The timestamps allow RankDisk to preserve interarrival times between requests and thus allow any advantages of a driver or device (tagged queuing, for example) to be felt

However, where the problem lies is this:

Playback of the TB3 generated trace file on a request reordering capable drive will not expose the drive's true performance potential. Specifically, because the trace file itself, having been generated on a drive (Maxtor 740DX) incapable of reordering requests, has a built in FIFO bias. As Martin and Olaf have elucidated, a NCQ type drive may have serviced the original requests differently, and hence a completely different trace would have been generated.

Taking this further, I note that every drive tested is also going to be burdened by a specific built in 740DX bias -- but that's something we just have to except, and accordingly, the focus remains on the FIFO vs NCQ aspect.

So while Rankdisk plays back the TB3 trace file perfectly, and is capbale of observing how the NCQ drive interacts with the TB3 trace file (including the cataloging of out of order interarrival times), it is still not an accurate representation of the NCQ drive's interaction within its natural environment -- the trace file represents a hardware introduced bias i.e. FIFO interactivity. An analogy would probably bring this phenomenon to better light, but I can't think of anything offhand and I'm in a rush to finish up.

Anyways, the problem that has been identified is compounded even further by the fact that each manufacturer's NCQ implementation may differ.

I'm not certain how one can remove any sort of bias from the test -- standardization maybe near impossible. I suppose we just have to pick one method, accept that its not perfect, draw light to that fact, and then do our best with imperfect information.

Ultimately, I don't think there would be anything greater then 10% performance swings (if even that), and moreover, I believe the general conclusions drawn about NCQ still hold -- just perhaps not as absolutely as previous data has implied, but the underlying themes are the same.

TTFN

Share this post


Link to post
Share on other sites

I always intended to respond again in that previous thread, but life got in the way, as it so often does :)

It's seems correct that to some degree the trace file generated by a CQ capable drive and one that's not CQ capable would differ even under the same usage (such as the WinMarks benchmark used to generate the trace file here). But the big question is, how different would it be?

The 'flaw' in the trace method is that it ties together the I/O from all threads, instead of allowing a thread to act independantly during playback. However, this only becomes relevant when the queue depth is greater than one.

So the first thing that needs to be examined is, how often does the queue depth go above one? Right from the start it seems this variable means the true impact of this 'flaw' may vary significantly under different scenarios.

Thinking about it some more, a single thread that is pushing the queue depth above one is not being held back during playback when no other threads were issuing I/O requests. If only a single thread is active but has a high queue depth, the playback mechanism still enables CQ to improve the playback speed, whilst the order commands were issued during capture would not have changed had the capture drive been CQ capable or not. So really it seems the potential impact only applies when the queue depth was pushed above one by multiple threads simultaneously issuing I/O requests.

Now that means to understand the potential impact you need to know what portion of the queue depth above one was generated specifically by multiple active threads. This may be difficult to determine even for single user scenarios, and becomes interesting under typical multi user scenarios. For example, a database server with multiple processors will almost certainly be running multiple threads to process the user transactions, but would the I/O requests to the disk subsystem be handled by the same (multiple) threads that accept the user transactions, or would all the disk I/O be handled by one, or more, other application threads?

Once (and if) you are able to determine the level of queue depth above one generated by multiple threads, how much of an impact is that going to make to the generated trace file? And assuming it is somewhat different, how much of a difference would the different trace file make to the playback results? I have to finish up for now so I'll post more later, hopefully after having expanded on some thoughts I have on this. But for now, here's an example.

Looking at the queue depth info for these new tests, it is over one for nearly 20% of the benchmark. If all of this was generated by multiple threads, then the trace file might be different for that 20% Let's say that file differs so much that the CQ capable drive has been handicapped to the point that CQ provides NO benefit. How much difference does CQ make in a desktop environment? According to benchmarks I've seen that don't use a trace capture/playback method, 10% at best, even in multitasking tests. So that means the drive might be missing out on 10% during 20% of the trace playback, for a total of 2% lost, at most. Of course this is just one scenario and others might vary more significantly.

Share this post


Link to post
Share on other sites
Thinking about it some more, a single thread that is pushing the queue depth above one is not being held back during playback when no other threads were issuing I/O requests.  If only a single thread is active but has a high queue depth, the playback mechanism still enables CQ to improve the playback speed, whilst the order commands were issued during capture would not have changed had the capture drive been CQ capable or not.  So really it seems the potential impact only applies when the queue depth was pushed above one by multiple threads simultaneously issuing I/O requests.

213013[/snapback]

I don't think that is true because SR will be testing different drives with the same trace. You have no control where each LBA resides on different hard drives. Even if a single thread is running I/Os at a Q greater than 1, different drives may have preformed the operations in different orders because LBA X may be halfway around the disk compared to the original trace drive. Because of this, the drive under test may have peformed even the single thread's I/Os in a different order.

Share this post


Link to post
Share on other sites
I don't think that is true because SR will be testing different drives with the same trace. You have no control where each LBA resides on different hard drives. Even if a single thread is running I/Os at a Q greater than 1, different drives may have preformed the operations in different orders because LBA X may be halfway around the disk compared to the original trace drive. Because of this, the drive under test may have peformed even the single thread's I/Os in a different order.

213063[/snapback]

He means thread as in stream of dependent requests, not as in threads as in processes/threads (I think).

Share this post


Link to post
Share on other sites

WRT testing and choosing drives. How should one read these tests in a boot drive - storage drive setup? I currently use the 36gb raptor for a boot drive for speed and to keep the OS/programs away from my storage. But I use a second drive to create all my files. I'm a Graphic Designer BTW. Should I be looking at just the raw read and write rates for the storage drive since thats all its really doing and then concentrate on pure speed for the OS?

WRT udaman and FS:

I do agree with points both Udaman and FS have made on one side the hobbiest hobbier of them all will look for the most miniscule details to pick over while the average user at some point just has to pick a drive and decide which one is good enough. IMHO the name brand really means squat unless you have a personal experience. Too many annecdotal remarks in any thread on both sides make asking about a certain brand useless. Thats why the Reliability Survey can be so useful. It would be nice to pull data from it to see trendlines for certain manufacturers and models.

Share this post


Link to post
Share on other sites
Thinking about it some more, a single thread that is pushing the queue depth above one is not being held back during playback when no other threads were issuing I/O requests.

213013[/snapback]

I didn't know this was possible. I always assumed that a thread would issue a disk request, then wait for it to be satisfied.

This is why I suggested that the benchmark should record & replay the activity on a per-thread basis. However, I don't see how the benchmark could decide when to launch the various threads in order to give meaningful results.

The benchmark is a trace of 30 mins of activity on Eugene's desktop machine.

If a new thread is launched 20 mins into the test because Eugene started some new application or activity, then the benchmark would need to launch the test thread at that "appropriate" time in the test (but compared to what?)

If a new thread is launched 17 mins into the test because some previous activity has come to an end, then that thread needs to be launched immediately after that dependent thread completes.

There is no way to extract this information from the existing trace, nor even (I think) to be able to record it in a new trace

The only way I can see to do this is to run the test for 30 minutes, launching each thread at the same time that it was originally launched when the trace was captured. One could then capture data on how quickly each thread completes, compared to the time taken on the trace.

The statistic would then be based on how quickly the various threads completed (within the overall 30 minutes), rather than how quickly the whole trace can be replayed.

cheers, Martin

Share this post


Link to post
Share on other sites
I don't think that is true because SR will be testing different drives with the same trace. You have no control where each LBA resides on different hard drives. Even if a single thread is running I/Os at a Q greater than 1, different drives may have preformed the operations in different orders because LBA X may be halfway around the disk compared to the original trace drive. Because of this, the drive under test may have peformed even the single thread's I/Os in a different order.

213063[/snapback]

He means thread as in stream of dependent requests, not as in threads as in processes/threads (I think).

213088[/snapback]

Yes, I was loosely using the term thread for a couple of reasons - it was already being used, and I think you'd probably find a stream of dependant requests would typically come from the same thread anyway. Or at least dependant requests from another thread probably wouldn't begin until the requests from the original thread were finished, so for the sake of discussion it may as well have been the one thread.

Share this post


Link to post
Share on other sites
Thinking about it some more, a single thread that is pushing the queue depth above one is not being held back during playback when no other threads were issuing I/O requests.  If only a single thread is active but has a high queue depth, the playback mechanism still enables CQ to improve the playback speed, whilst the order commands were issued during capture would not have changed had the capture drive been CQ capable or not.  So really it seems the potential impact only applies when the queue depth was pushed above one by multiple threads simultaneously issuing I/O requests.

213013[/snapback]

I don't think that is true because SR will be testing different drives with the same trace. You have no control where each LBA resides on different hard drives. Even if a single thread is running I/Os at a Q greater than 1, different drives may have preformed the operations in different orders because LBA X may be halfway around the disk compared to the original trace drive. Because of this, the drive under test may have peformed even the single thread's I/Os in a different order.

213063[/snapback]

An individual thread is always going to issue the requests in the same order, whether on CQ capable hardware or not. As the activity that requires disk i/o occurs in the system, the thread will place a request in the queue. Depending on the application, the activity will either be dependant or not.

If it's dependant, the requests must be completed before additional requests are placed in the queue (it most likely will be on hold). The order that the application has i/o requirements doesn't change, so the order they are placed in the queue doesn't change.

Or if it's not dependant, the application/thread will just keep dumping i/o requests in the queue as the requirements occur in the application. Again, the order these requirements occur doesn't change, so the order they are placed in the queue doesn't change.

The capture/trace process captures the order the thread (or all threads) place(s) requests in the queue. As long as only one process has any outstanding requests in the queue at any point in time, the order they are fulfilled (in both normal operation or playback of a trace) will not alter the order they requests go in the queue.

Therefore you can see that the same activity generates the same queue order (trace/capture) on any drive, CQ or not, as long as only one thread has outstanding i/o in the queue at any moment in time.

Share this post


Link to post
Share on other sites
An individual thread is always going to issue the requests in the same order, whether on CQ capable hardware or not.  As the activity that requires disk i/o occurs in the system, the thread will place a request in the queue.  Depending on the application, the activity will either be dependant or not.

If it's dependant, the requests must be completed before additional requests are placed in the queue (it most likely will be on hold).  The order that the application has i/o requirements doesn't change, so the order they are placed in the queue doesn't change.

Or if it's not dependant, the application/thread will just keep dumping i/o requests in the queue as the requirements occur in the application.  Again, the order these requirements occur doesn't change, so the order they are placed in the queue doesn't change.

The capture/trace process captures the order the thread (or all threads) place(s) requests in the queue.  As long as only one process has any outstanding requests in the queue at any point in time, the order they are fulfilled (in both normal operation or playback of a trace) will not alter the order they requests go in the queue.

Therefore you can see that the same activity generates the same queue order (trace/capture) on any drive, CQ or not, as long as only one thread has outstanding i/o in the queue at any moment in time.

213345[/snapback]

Nope, sorry, don't see that at all.

Presume that a thread is doing serial I/O to a non-fragmented file. It issues (for example) 16 non-dependent I/O's:- current-block + 1 thru current-block + 16.

The disk sees 16 requests which "happen" to be for contiguous blocks. How does it handle them? It simply sees 16 I/O's that can be satisfied by contiguous reads after a single head move. The 15 later I/O's will be queued into the buffer, then returned to the app immediately after the first.

The requesting thread might see several times the throughput, compared to dependent I/O's to a non-CQ HD [in a very busy system].

Thus, the non-CQ reference system may have added considerable delays (in the trace), if other threads have also issued interleaved I/O's.

This is a scenario where a real-world app could perform multiple I/O's to a CQ drive in a time that the benchmark would add significant delays.

cheers, Martin

Share this post


Link to post
Share on other sites

Sorry Martin, I'm having a little trouble following what you're saying. I get the idea, but without understanding exactly what you mean at each stage I can't really respond. Do you mind elaborating?

I get the feeling that perhaps we have a different understanding of some aspects to how things work, and it's skewing our interpretation of what the other is saying. I'll give you some of my thoughts in regards to your post that might help you understand where I'm coming from.

Presume that a thread is doing serial I/O to a non-fragmented file. It issues (for example) 16 non-dependent I/O's:- current-block + 1 thru current-block + 16.

Why would a thread issue 16 individual requests of one block each for 16 consecutive blocks, rather than one request for 16 blocks?

The disk sees 16 requests which "happen" to be for contiguous blocks. How does it handle them? It simply sees 16 I/O's that can be satisfied by contiguous reads after a single head move. The 15 later I/O's will be queued into the buffer, then returned to the app immediately after the first.
The ability for CQ drives to intelligently assess the list of requests and determine the optimal order to fulfil them would, in this scenario, just determine that they should be performed in the order of 1-16.

- read-ahead caching is what places 2-16 in the buffer. This would also occur on non-CQ enabled drives

- it still does not alter the order the requests were placed in the disk queue

Thus, the non-CQ reference system may have added considerable delays (in the trace), if other threads have also issued interleaved I/O's.

I didn't dispute this. My post you quoted was specifically in regards to when this is not the case - refer to the last line.

Share this post


Link to post
Share on other sites
Presume that a thread is doing serial I/O to a non-fragmented file. It issues (for example) 16 non-dependent I/O's:- current-block + 1 thru current-block + 16.

Why would a thread issue 16 individual requests of one block each for 16 consecutive blocks, rather than one request for 16 blocks?

213415[/snapback]

Some applications actually do things like that. But typically they call readv() (Unix) or ReadFileScatter() (Windows) for that purpose, so it's up to the OS to decide how to request actual disk IOs. Actually it's that way in pretty much every modern OS, even if you do a single read() from the application: what looks to the application like a contiguous block of (virtual) memory may have been mapped by virtual memory management to widly scattered blocks of (physical) RAM addresses (common page sizes are 4 to 64 KB).

So in the end it's up to the OS and driver combo how requests arrive at the drive interface - if the host can't do scatter/gather DMA then a scenario like the one pondered by MartinP is quite possible.

Share this post


Link to post
Share on other sites
Yes, I was loosely using the term thread for a couple of reasons - it was already being used, and I think you'd probably find a stream of dependant requests would typically come from the same thread anyway.  Or at least dependant requests from another thread probably wouldn't begin until the requests from the original thread were finished, so for the sake of discussion it may as well have been the one thread.

213342[/snapback]

But the confusion starts when an app is using async IO but only a single thread.

Share this post


Link to post
Share on other sites
Why would a thread issue 16 individual requests of one block each for 16 consecutive blocks, rather than one request for 16 blocks?

For example because it can process the blocks in any order and waiting on the first block would introduce an unnecessary delay.

The ability for CQ drives to intelligently assess the list of requests and determine the optimal order to fulfil them would, in this scenario, just determine that they should be performed in the order of 1-16.

Not true. If they're on the same track, it depends on rotational latency which block is seen first. If they're not on the same track, it depends on much more.

- read-ahead caching is what places 2-16 in the buffer.  This would also occur on non-CQ enabled drives

Maybe, maybe not. That depends on the read-ahead logic.

- it still does not alter the order the requests were placed in the disk queue

True, but only because there are no dependant requests.

Share this post


Link to post
Share on other sites

All very good points that hadn't occurred to me. The point about rotational latency is particularly important - rotational latency could often mean that, if using async IO, with the 16 blocks broken up in to 16 individual requests, blocks 9-16 (for example) could be fulfilled first if the head happens to land on the track when the platter is at that location.

So I wonder then, how often is IO performed this way? I guess the low queue depths generated during the benchmarks suggests not very often, if hardly ever?

And where does it leave us in the main purpose of this discussion? My initial impression is that my previous statement is still applicable. The same activity generates the same queue order (trace/capture) on any drive, CQ or not, as long as only one thread has outstanding i/o in the queue at any moment in time, as none of this still appears to alter the order requests are placed in the queue.

It's probably important to remember the queue captured by the trace is the OS queue, and not the queue in the drive itself (which is the queue that CQ manages and sorts as appropriate). It occurred to me that this could cause for confusion for anybody who hadn't made the distinction so I thought would be worth pointing out.

Share this post


Link to post
Share on other sites
So I wonder then, how often is IO performed this way?  I guess the low queue depths generated during the benchmarks suggests not very often, if hardly ever?

Unfortunately, not very often, in desktop apps. Sync IO is easier and more standardized.

And where does it leave us in the main purpose of this discussion?  My initial impression is that my previous statement is still applicable.  The same activity generates the same queue order (trace/capture) on any drive, CQ or not, as long as only one thread has outstanding i/o in the queue at any moment in time, as none of this still appears to alter the order requests are placed in the queue.

213621[/snapback]

I think that statement is still valid.

Share this post


Link to post
Share on other sites
When I made the original point, I was really thinking of multiple threads, each with one I/O outstanding.

cheers, Martin

213667[/snapback]

Sure, I understand this. The main reason I noted that the statement I made seemed valid was to define when the trace method is valid and not flawed in relation to the points you have made. From there we can look further at the scenarios where the trace method in fact is, or might be, flawed, and just how much this affects the outcome.

Which is where I left off originally and will have to do so again until I have more time.

I did have a thought though. Eugene, as you can see there's some question as to the validity of the testing methodology. As I'm sure everyone, yourself included, would like to remove that question mark, is it possible to do some testing? Capturing another set of traces (probably just one desktop and one server) with a CQ capable drive, and running the benchmarks on a handful of CQ and non-CQ capable drives and examine the difference in the results between the two trace sets, would probably prove whether or not there is anything to this. I understand this is probably time consuming but it would put the question to rest.

Share this post


Link to post
Share on other sites

Well, beside the still unknown implications of the particular drive used to capture the trace there are some other concerns.

In another thread Eugene posted results for some drives both for TB3 and TB4, using the same TB3 trace. Results between both systems differ by as much as 5% for the same drive, while they differ by less than 1% for repeated runs on the same system. Thus whatever this benchmark measures apparently not only depends on the drive, but also reflects non-negligible impacts from the test platform.

Results for the same drive even differ by as much as +-25% between the TB3 and TB4 traces. Thus the programs/workload used for capturing the trace seem to have a really significant impact on the results. Picking any of them to be a "golden" DriveMark seems rather arbitrary to me.

The ratio of disk idle vs. active time during trace capture is not known. If idle time is large compared to active time then the DriveMark results will over-emphazise the difference between drives. Thus it's hard to tell if any difference in DriveMark results has non-negligible effects in real use.

AFAIK the captured traces and benchmark programs are not easily available for peer review. Thus it's pretty hard to pin-point weak spots, and we're mostly restricted to educated guesses.

Share this post


Link to post
Share on other sites
I did have a thought though.  Eugene, as you can see there's some question as to the validity of the testing methodology.  As I'm sure everyone, yourself included, would like to remove that question mark, is it possible to do some testing?  Capturing another set of traces (probably just one desktop and one server) with a CQ capable drive, and running the benchmarks on a handful of CQ and non-CQ capable drives and examine the difference in the results between the two trace sets, would probably prove whether or not there is anything to this.  I understand this is probably time consuming but it would put the question to rest.

213748[/snapback]

The problem is the difference is going to vary from drive to drive. And it's not only CQ algorithms that will effect things, but the drive's performance under a given trace, which relates to every other aspect of its performance from rotational latency to capacity!

Essentially, the way CQ interacts with workloads creates a complex system. A complex system is going to be impossible to breakdown into a controllable, but still relevant, test context.

What Storagereview has always tried to do was isolate the drive to get accurate results. This is good classical scientific methodology, and it is why I've always been attracted to SR. On the other hand, computers are complex systems. Consequently testing isolated units can become utterly irrelevant. This is why specific, non-generalized approaches are necessary when picking a disk or disk array for performance in an application. Test the actual database you are going to run. Test the actual application against actual system configurations. To do anything less is irrelevant in many situations. I have often laughed at how enthusiasts or gamers often tune their RAID arrays to run HDTach or Sandra as well as possible when such tests are utterly irrelevant to the kind of performance they actually want.

However, I still believe that SR's Drivemarks offer the best indication of a drive's performance for general desktop use. If you need performance for a specific purpose though, a general test is utterly irrelevant on a modern computer system. The disk does not operate in isolation, and what you, Chew and MartinP, are exploring is just the tip of a much bigger iceberg that was around long before the advent of CQ on SATA drives.

Share this post


Link to post
Share on other sites
However, I still believe that SR's Drivemarks offer the best indication of a drive's performance for general desktop use.  If you need performance for a specific purpose though, a general test is utterly irrelevant on a modern computer system.  The disk does not operate in isolation, and what you, Chew and MartinP, are exploring is just the tip of a much bigger iceberg that was around long before the advent of CQ on SATA drives.

213835[/snapback]

I agree completely. These are just general tests, and need to be viewed in context.

However, some testing along the lines I have suggested will at least assist to validate these results as acceptable for even general use. My feeling is that the issue raised by MartinP would alter the results insignificantly, but if MartinP's suggestion that the results could be altered more significantly under some circumstances is correct, it would be nice to know by how much. What if it's to such a degree as to invalidate the results for even general use? What would be the point of even having these benchmark results then?

We could theorise on this for a long time, or run some tests to possibly put the matter to rest. Some of us might enjoy the thinking exercise, but most would probably prefer just to know the end result :)

Share this post


Link to post
Share on other sites
In another thread Eugene posted results for some drives both for TB3 and TB4, using the same TB3 trace. Results between both systems differ by as much as 5% for the same drive
However, I still believe that SR's Drivemarks offer the best indication of a drive's performance for general desktop use.
What if it's to such a degree as to invalidate the results for even general use?

I may be entirely wrong, and premature in conclusion given that we have only a small portion of Eugene's TB4 data set and no formal write up to yet review, nonetheless I will express my opinion (in which I'm not addressing the NCQ issue, see my thoughts in an above post if you want those, but rather a much broader scope) that:

Currently, my gut feeling leads me to believe that the SR tests are NOT universally conclusive to a drive's general performance simply because the test performance is too bound and dependent upon platform hardware (controllers) and software (controller driver) specifics. Moreover, the results/leaderboard may not even be a good proxy for general conclusions -- as they strike me as maybe being valid only for a specific platform -- SR's test machine. The wild swings we observe between the TB3 to TB4 tend to suggest this too. Even in the small 2 drive cross-sectional, removing one of the variables (test suite) still seemed to generate some noticeable differences (I eye-ball as up to 8-9% in one of the tests...I only took office and highend into consideration, given we only have full data on those two measures currently).

Given this small glance into characteristics of the test, and the sheer myriad of controller-driver combinations in the wild (all of whom have performance profiles undetermined), I wonder how we can generalise results about a drive beyond a given platform? Each day holds a high degree of probability that someone will enter the forums and post a question pertaining to which drive is better (between x and y and z). Can we honestly say that responses such as "look at the leaderboard results" are going to give that person an accurate picture of drive performance on a hardware/software platform differing anything beyond SR's current machine. I begin to think no.

Informal observation of member input, over the years, in the forums might also shed some light in this regards. On numerous occasions I've seen individuals quite vehemently contest a given drive's performance in relation to the SR results/leaderboard and their own experience with its operation. An argument raised against these individuals was that the empirical evidence supported the validity of SR's results. I believe we have now come full circle whereby the very same empirical evidence generated by SR is now supplying validity to those individual's claims/contestation.

What would be the point of even having these benchmark results then?
To try to investigate and progress the development of accurate tests of comparitive measurement! Just because a current model leads to cross-roads does not mean you turn back and head home. SR has generated a lot of very useful research over the years. Lets not give up the chase now!

And again, I will stress that I may be completely wrong in my assesment, and fully admit to be basing it upon incomplete information. I'm just calling it as I currently see it.

Share this post


Link to post
Share on other sites

Since the TB4 benchmarks are now, well, benchmarks, couldn't you just run the actual Veritest software instead of playing back captured traces of it? That would at least eliminate the CQ/request-reordering/thread-timing issues. I imagine the benchmark is scriptable so it shouldn't be too labor intensive.

Either that, or we need to see some data on the variation between actually running the benchmark and playing back a trace, on several different drives with/without CQ.

And just to throw some more fuel on the fire, I have to imagine that the buffer size of the drive on which the trace was recorded significantly affects the timing of requests as well. Consider a trace recorded on a drive with an 8MB buffer. If a request misses the cache, that particular drive will take a few milliseconds to fetch the data from the platter, and so the trace will record a delay before the next request is issued. However, if the same request is serviced instantly from a 16MB buffer drive because its in the cache, then the next request won't have to wait. The trace adds an artificial delay and skews the results toward drives with the same buffer size as the reference drive.

Mmmm, life is complicated.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now