Since the TB4 benchmarks are now, well, benchmarks, couldn't you just run the actual Veritest software instead of playing back captured traces of it? That would at least eliminate the CQ/request-reordering/thread-timing issues. I imagine the benchmark is scriptable so it shouldn't be too labor intensive.
Either that, or we need to see some data on the variation between actually running the benchmark and playing back a trace, on several different drives with/without CQ.
And just to throw some more fuel on the fire, I have to imagine that the buffer size of the drive on which the trace was recorded significantly affects the timing of requests as well. Consider a trace recorded on a drive with an 8MB buffer. If a request misses the cache, that particular drive will take a few milliseconds to fetch the data from the platter, and so the trace will record a delay before the next request is issued. However, if the same request is serviced instantly from a 16MB buffer drive because its in the cache, then the next request won't have to wait. The trace adds an artificial delay and skews the results toward drives with the same buffer size as the reference drive.
Mmmm, life is complicated.