Sign in to follow this  
asymmetric

New disk benchmark software -- opinions?

Recommended Posts

I spent some of last night and most of today writing this thing.. I want something between IOMeter on the complicated end, and ATTO on the simplistic end.

So far it's leaning more towards ATTO as far as complexity goes, but I'll be adding more knobs later based on feedback.

Here's a screenshot of it, after I ran it on both of my partitions :

diskbench_1.jpg

Looking for any kind of feedback.. mostly about percieved accuracy right now, since it's pretty beta, but also about features.. bugs, of course, should be noted.

Operation is very straight-forward.. default scores are important to help comparisons, but fiddle with it if you feel like.

Known limitations :

Don't select a read-only drive such as a CD-ROM and try to benchmark it. I don't know what might happen, but it won't be good when it tries to write to it.. program may crash or hang.

Save, Open and Stop buttons don't currently do anything.

No progress meter of any kind; You may think it's hung, but it's probably not. If it isn't redrawing itself, check your CPU usage. If it's low, it's probably hung. If CPU usage is high, could be hung or just starved for CPU.

If you want to give it a try, it's here http://rfnj.org/benchmarks/drivebench.zip

No installation needed. Just unzip it and run it from wherever you like. If it crashes, delete the testfile it creates in the root directory of whatever drive you run it on.

Share this post


Link to post
Share on other sites

I ran it on my Cheetah 36ES and it came up with some really strange results. I know for a fact that this drive can't hit 300MB/s, especially since it is connect to a U2W controller.

Here is the image:

cheetah36es.gif

Image turned out kinda big...Sorry bout that.

Share this post


Link to post
Share on other sites

This drive (Cheetah 36ES) normally hits a little over 50MB/s when tested on ATTO. Maybe there is a math error somewhere in your code. If you want to post your source people here would probably be able to point out where the problem is.

What language is this written in? I'm just curious...

Share this post


Link to post
Share on other sites
If you want to post your source people here would probably be able to point out where the problem is.

Do I get extra points for figuring out the problem, without the source code?

Pass FILE_FLAG_NO_BUFFERING to your CreateFile routine. If you are using a run time library routine, and not calling CreateFile directly, you will not be able to benchmark the drive properly.

To use FILE_FLAG_NO_BUFFERING, you need to make sure that your buffers are aligned to the block size, since the hardware will be DMAing directly to/from your user buffers.

It would be nice to have a generally available, single benchmark that tested STR, ios/s, and played back traces. Thank you for sharing your program asymmetric. I am always happy to see another programmer added to our mix.

Share this post


Link to post
Share on other sites

The math is ok.. I forgot that if you run it on Win9x (including, of course, ME) the command to disable the cache for access to that file doesn't work / is different.

The 300MB/s is correct.. but something else is happening, probably to do with cache.

That is, unless you're going to say you're running on NT, 2000 or XP.. if that's the case, I'll have to look deeper, but on 9x/ME using FILE_FLAG_OVERLAPPED isn't allowed, so I'll have to work around it.

New version is up, should work correctly on 9x now.. oooops.

http://rfnj.org/benchmarks/drivebench.zip

Makes me wonder though.. how does ATTO handle overlapped I/O on 9x?

Share this post


Link to post
Share on other sites

If you want to post your source people here would probably be able to point out where the problem is.

Pass FILE_FLAG_NO_BUFFERING to your CreateFile routine. If you are using a run time library routine, and not calling CreateFile directly, you will not be able to benchmark the drive properly.

To use FILE_FLAG_NO_BUFFERING, you need to make sure that your buffers are aligned to the block size, since the hardware will be DMAing directly to/from your user buffers.

Close.. FILE_FLAG_OVERLAPPED doesn't work on 9x according to the SDK, at least not for files/pipes/etc.. apparently it's only for things like serial ports on 9x.

NO_BUFFERING I already handled, and the scores here on my system look ok.. but you get a few points I suppose for even understanding CreateFile. ;)

Share this post


Link to post
Share on other sites
This drive (Cheetah 36ES) normally hits a little over 50MB/s when tested on ATTO.  Maybe there is a math error somewhere in your code.  If you want to post your source people here would probably be able to point out where the problem is.

What language is this written in? I'm just curious...

Delphi 6.

Share this post


Link to post
Share on other sites

I did run it under Windows 2k. Are you sure caching is disabled?

Share this post


Link to post
Share on other sites
I did run it under Windows 2k.  Are you sure caching is disabled?

Well, I'm sure it's disabling it when I run it here.. my scores look pretty reasonable..

I'm not sure what could be skewing the score like that then.. I'll look into it.

It could be a driver issue, but I doubt it. The benchmark relies on the FILE_FLAG_NO_BUFFERING flag to be honored, and to call the event handler (on NT based windows only) for overlapped I/O completion. Perhaps for some reason this handler is getting called immediately?

If you feel up to it, try it again with the Outstanding I/O's set to 1.

Share this post


Link to post
Share on other sites

I ran it with the Outstanding I/O's set to 1 and the results look a lot better now. Close to what I get with Atto or other benchmarks.

Is there anyway that you can make the drivebench not totally kill the system while it is running?

Share this post


Link to post
Share on other sites
I ran it with the Outstanding I/O's set to 1 and the results look a lot better now.  Close to what I get with Atto or other benchmarks.

Is there anyway that you can make the drivebench not totally kill the system while it is running?

Cool.. thanks for the test.

I'll look into what's causing the CPU load.. I think I have a handle on it already.. ;)

Share this post


Link to post
Share on other sites
but you get a few points I suppose for even understanding CreateFile. 

Yeah, next week I hope to learn CloseHandle().

Are you aware that FFS is case sensitive?

Having secured a copy of your program, it's clear what the problem is. Your overlapped logic is severely broken. You are issuing thousands of overlapped requests.

And some people say queue depth rarely exceeds 1. :wink:

Share this post


Link to post
Share on other sites
Do I get extra points for figuring out the problem, without the source code?

I expect those points now.

Share this post


Link to post
Share on other sites

but you get a few points I suppose for even understanding CreateFile. 

Yeah, next week I hope to learn CloseHandle().

Are you aware that FFS is case sensitive?

Having secured a copy of your program, it's clear what the problem is. Your overlapped logic is severely broken. You are issuing thousands of overlapped requests.

And some people say queue depth rarely exceeds 1. :wink:

Some people haven't debugged any code I guess. :)

FFS isn't case-sensitive in Delphi -- nothing is unless you specifically enable it in the compiler options.

As for CPU usage, that was something else entirely. I increased the timeout in the event.waitfor sleep loop from 1ms to 10ms.. I hover around 10-20% CPU for the whole app now.

Also, added some quick measure of "progress" by updating the scores as they come in..

I don't see the overlapping getting out of hand.. what are you using to monitor it? The closest I found was "Current Disk Queue Length" in perfmon, for PhysicalDisk. That counter hits an average of 15 during the run. Min 0, max 91, Scale is 1.0.

I issue only the number of ReadFile/WriteFile calls specified by the user in the GUI before I start making the calls to GetOverlappedResult. I don't issue any more calls until GetOverlappedResult indicates that NumberOfBytesTransferred is = to the number I requested, then I slam it up to MaxRequests again.

No points yet.. sorry.. but I am interested in how you're coming to these conclusions.. care to share?

Share this post


Link to post
Share on other sites

Oh.. can't edit a post.. forgot.

Version I mentioned is available.

New:

-- Low CPU usage (hopefully. About 12% on an AthlonXP 1600+ w/ SCSI)

-- Stop button works

-- Win9x / NT checking

-- Status updated in statusbar at bottom of screen

-- Running graph of results as they come in

-- Correct URL. Forgot to rename file.

Bugs:

-- Higher queue depths may still report strange results.. let me know

-- Running graph won't start until the 2nd test finishes; TChart oddity/bug workaround.

As always : http://rfnj.org/benchmarks/drivebench.zip

Share this post


Link to post
Share on other sites

Ok.. tracked down the Outstanding IOs issue..

Turns out that it's somehow scaling based on processor speed.. I have a dual athlon rig here, so I didn't notice it with the (relatively) low numbers I've been testing.

Setting it to around 200 I get over 500MB/s myself from a single channel u160 RAID.

Not sure how I'm going to deal with this right now.. requires some thought as to why this is happening, and how to deal with it.

The scores being turned in aren't wrong per se, but they are not representative of true disk I/O either.. something is still being cached somewhere I imagine, or maybe windows is concatenating all the outstanding requests into a larger request when it gets too backed up.. thus skewing things.

Thanks to everyone who tested it.. more to come soon.

Share this post


Link to post
Share on other sites
FFS isn't case-sensitive in Delphi -- nothing is unless you specifically enable it in the compiler options.

I meant that Berkeley’s Fast File System is case sensitive. I wasn’t able to download DriveBench.zip, because the link listed it as drivebench.zip.

Man, you are really making me work for these points.

I too am using perfmon. The IOs build, until the file handle is closed, which immediately cancels all outstanding IOs. You say that you are measuring 91 outstanding IOs. How many do you think you are issuing?

Naturally, the number of outstanding IOs in your latest version is lower. Why? Because you are wasting more time. Any time increasing a timeout measurably decreases CPU utilization, it means that your cpu(s) were busy. If you are blocking on IO completions properly, your cpus should not be busy.

You are either timing out prematurely, or you are otherwise, not properly waiting for IO completion. Perhaps you issue six IOs, wait for one to complete, and then issue another six? In any case, this is where your program is broken.

Since I suspect I won’t get my points until this is all working, here is what you need to do.

Create an array of OVERLAPPED structures (records to you I guess), equal to the queue depth specified by the user. Create an equal sized array of handles to Events(CreateEvent()), and assign them to the OVERLAPPED structs.

Open the file with FILE_FLAG_OVERLAPPED and FILE_FLAG_NO_BUFFERING.

Immediately issue the IOs, up to the level specified as the IO queue depth. Call WaitForMultipleObjects() passing the Event array as lpHandles. The timeout should be INFINITE. Your benchmark program should never need to timeout.

When the WFMO call drops through, call GetOverlappedResult for the IO which has just been completed. You should then issue exactly one IO.

This is a bit simplified, as things get a messy towards the end, when the outstanding IOs decrease, but it should give you an idea.

This method will limit your queue depth to MAXIMUM_WAIT_OBJECTS, which IIRC is 64.

I recommend you dispense with all of this, and use completion ports instead. The bottom code snippet in this post is the (simplified) inner loop of a program that does exactly what you are trying to achieve. A small modification will allow it to support writes, as well as reads.

I would also recommend you dispense with Delphi for this type of program, but Pascal is cool, and some people like to exercise their broadband with 705K executables. How do you brag that you can download a 25K file quickly?

Share this post


Link to post
Share on other sites

Hello-

I'm runnning an 18GB X15-36LP on a 29160 with WinXP Dynamic Disk. Here are my results...(weird):

DriveBench V0.1.0.15 ready.

0.5 KB request size. 13 MB/s Read, 6 MB/s Write

1 KB request size. 10 MB/s Read, 13 MB/s Write

2 KB request size. 26 MB/s Read, 32 MB/s Write

4 KB request size. 44 MB/s Read, 64 MB/s Write

8 KB request size. 133 MB/s Read, 160 MB/s Write

16 KB request size. 97 MB/s Read, 291 MB/s Write

32 KB request size. 74 MB/s Read, 321 MB/s Write

64 KB request size. 68 MB/s Read, 318 MB/s Write

128 KB request size. 58 MB/s Read, 318 MB/s Write

256 KB request size. 160 MB/s Read, 318 MB/s Write

512 KB request size. 94 MB/s Read, 355 MB/s Write

1024 KB request size. 64 MB/s Read, 355 MB/s Write

2048 KB request size. 188 MB/s Read, 355 MB/s Write

4096 KB request size. 93 MB/s Read, 340 MB/s Write

Clocker

Share this post


Link to post
Share on other sites

Anyone care to post a working link? :)

Share this post


Link to post
Share on other sites

I took it offline last night.. when I get home from work today I'll work on it some more, then post a new link.

I've got everything sorted out but the CPU usage.. and honestly, I'm not that concerned about it.. atto and all the rest tie up the cpu really bad themselves.

I didn't want more people downloading a buggy version.. I'll post more tonight... thanks for the interest :D

Share this post


Link to post
Share on other sites

Waiting for an asymmetric response...

Share this post


Link to post
Share on other sites

Ok.. so I got my new case last weekend.. nice CK-1100.. I set about tearing everything apart in my living room and moving it into the new case.

I saved the PS for last.. figured it would be nice to put that last bit in and fire everything up as soon as it was ready.

Disaster.. the connector leading from the PS to the MB wasn't long enough to reach up to the top of the case, through the cable-routing gap and down to the other side.. it fell short by about two inches.

Not wanting to have just wasted a few hundred bucks on this case, nor wanting to slice n' dice it or try to find a PS with a longer cable, I decided to get down to some hardware hacking. I went to radioshack, bought about three rolls of 15' 18AWG (stranded) and a few packs of the screw-on connectors. Get home, start cutting cables and splicing in about an extra foot one by one.

More bad news.. only had 20 of the screw-connectors, which I dumbly thought would be enough, until I realized I'd need two per line instead of one.. out came the electrical tape.

The first connector on the SCSI cable won't reach either, short by about the same amount.. everything is closed.

... the next day ...

I go on a mad-dash to find a new SCSI cable. None of the CompUSA stores in a 20 mile radius have any internal 68-pin SCSI cables, only 50 pin. Finally I find one at the local store.. it's only 5+1 connectors, but it looks a bit longer and is "round" so I figure it has a better chance to fit. Get home, no joy.. still too short.

... a while later ...

Trying to get my hot-swap cages going, I find out one of them is totally screwed up. I only had two in the previous case before, the other two sitting on a shelf. With all four in I'm trying to track down just which one went bad, and it's like playing whack-a-mole at the arcade.. they all worked fine by themselves, but random timeouts would occur when it was just one of them.

Trying to track down which one, I'm "hot plugging" the bays like I've done a million times before on a million systems. This time I must've been to tired from all the screwing around. Plugging the 4pin molex into the removable bay.. blue light.. snap sound.. everything shuts off. Uh oh.

... another while later ...

I fried the chassis I was plugging in at the time, and the other one that was on that same power-cable chain. I strip them out and order two more, should be here tomorrow or monday fedex.

... even more time passes ...

I can't get all four drives hooked up unless I want to tear everything down again and use the internal 3.5" bays (that I removed) from the MOBO side of the case.. I'm not up to it. I resign myself to just using the three drives until I have either : Another "not quite long enough" cable and a dual-channel card -or- a long-enough cable.

... agony ...

Get the three drives in, get everything reinstalled, and remember my last backup is a week old.. a week being about the age of the benchmark, I've lost all the source.

I'll rewrite it this weekend sans bugs from last time, and with a nicer graphing package..

Keep your eyes open. :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this