Sign in to follow this  
t_3

Horribly bad Velociraptor performance on overlapping I/Os

Recommended Posts

Hello,

I have a terribly annoying problem with my brand new machine setup; As follows:

Asus Maximum II Formula board, 2x Velociraptor (SATA ports 1+2, no RAID), 1x WD 640GB (SATA port 3), 1x WD Green 1TB (E-SATA port), all in AHCI mode. Running an up-to-date XP. As I noticed a somewhat sluggish performance all the time while working, especially on my new system disks (the Velocirapors of course - Btw. previously I had a 15k SCSI as system disk), and most notably while copying larger files between different partitions on the same drive, I did some HD Tach benchmarks which didn't show anything remarkable - in first place. The Velociraptors sported around 130mb/s max ... and all the other disks showed the numbers one would assume.

Then I thought, what about running HD Tach twice at the same time ... and BUMMER!

The results:

- Velocirapors were (with a nearly flat graph) at ~10mb/s average (!!!)

- WD 640GB had a normally looking graph with an average of >70mb/s

- WD Green 1TB (also with a flat graph) at ~6mb/s avarage (!!)

Uh-oh!

After checking BIOS, drivers, settings, registry, and all that stuff, I found something to be not as it should be; First I got a new Intel ICH driver (from September, the old one was from July), and also found that i hadn't installed the Marvell SATA driver (which is for the E-SATA port of that board).

And then re-checked:

- WD 640GB now had a choppy graph between 50mb/s and 100mb/s, but with an average of >90mb/s (...)

- WD Green 1TB now had a normally looking graph with an avarage at nearly 80mb/s (!!!)

BUTTTT

- Only the Velociraptors were nearly at the same level than before, just now at a ~15mb/s average (?!?)

...as I wrote bevore: having 2 HD Tach (quick) benchmarks running at the same time.

So I definitely had some driver issues solved - Installed the marvell drivers, which apparently boost the "multi read" performance of the connected drive. And there IS a notable performance increase between intels ICH drivers from July and September, BUT - and that drives me nuts - not for the Velociraptors. They are only fast while no overlapping I/Os happen. At multiple reads/writes there seems something badly broken. And I really don't understand what it could possibly be - because the WD 640GB connected to the same chipset don't has problems at all. So instead having a superfast-lowlatency-disk for heavy workload I got the exact opposite :(

PS: I recently installed Vista on the same machine and could exactly duplicate the issue; The multi-performance of the 1TB WD Green was bad with the standard drivers and perfect with the Marvell SATA drivers, and the only difference was that the 640GB WD drive was already as fast with the drivers that came with vista (so there was no difference to the newer Intel drivers). Only the Velociraptors were badly slow on overlapping I/O, no matter what I did...

Any idea someone?

Thanks in advance for every opinion you have about this...

Edited by t_3

Share this post


Link to post
Share on other sites

One addition: It is clearly a problem with parallel reads!

Some more test:

Copying one 700MB file from the WD 640GB to the Velociraptor: 6 seconds.

Copying one 700MB file from the Velociraptor to the WD 640GB: 7 seconds.

Copying one 700MB file from the WD Green 1TB to the WD 640GB: 7 seconds.

...just to show that the WD 640GB isn't the bottleneck.

and now:

Parallel copying two 700MB files from the WD 640GB to the Velociraptor: 15 seconds.

Parallel copying two 700MB files from the WD Green 1TB to the WD 640GB: 20 seconds.

Parallel copying two 700MB files from the Velociraptor to the WD 640GB: 110 seconds!!!

???

Share this post


Link to post
Share on other sites
Same problem here with my good old 975X board....

"good" to hear that i'm not alone with a problem. sorry for you ;)

do you have some details about what you found and how you tested?

i myself have nailed the problem down (at least a bit) today. i changed the sata cables - just in case - and also connected the drives to 2 of the marvell-chipset driven ports, and what should i say - same thing than before. everything is fast on single or multiple read or write access, just the velocis are unbelievable slow while parallel read attempts.

so i'd say it can't be an issue of the chipset, or the os, or drivers - it must be the drives! i'll continue checking things like drives firmware and jumpers (but i think there aren't any) and will test the drives with another board, which is, btw. an asus p5w deluxe (too a 975x board). already curious what i'll see there.

i still have another two velociraptors in my server (asus dsbv-d i5000 board) which work like expected (fast!); maybe i'll put them out and test'em on my other system over the weekend, what would be problematic because they are in a RAID and therefore can't be easily connected to another pc.

anyhow, i'll give updates if i find something, just in case anyone else comes across such problems...

Share this post


Link to post
Share on other sites

I already thought, something must be wrong with my board or the drivers. Or the ICH7 doesn't like the veloci.

And now i hear, that somebody gets almost the same strange figures. I recognized the

weakness during parallel demuxing of video streams and file copying over the network accessing the veloci with huge reads.

Simple test: 2 parallel instances of HD Tach 3; immediately visible @ sequential read test. ~12mb/s . with 1 seq-read, i get the common speed starting with 120mb/s .

Burst, CPU, random acess are all ok, but not sequential reads.

Config = E6600@P5W64 WS Pro , 4GB Ram, OS=Server 2008, Driver = Intel 8.6, SATA-Ports in AHCI mode (didn't test compatible mode till now)

My feeling says: essential problem with NCQ implementation. I'll test without NCQ

Share this post


Link to post
Share on other sites

It's even worse without NCQ; i get ~ 7-8 mb/s with 2 instances @ seq read.

I compared with my 6400AAKS; this drive has no problem at all!!!!! Same config and tests deliver between 60 and 80 mb/s with 2 instances of seq read.

I'm frustrated......

What precise versions of Velocis do you have? mine 2 are both WD3000GLFS-01F8U0

Share this post


Link to post
Share on other sites

Btw: The 6400AAKS test was done on a 780G board, not on the P5W64.

I would be interested in testing the Veloci on the 780G board, but that would mean heavy moving furniture arround.

I tested my WD1500ADFD on the P5W64 now and got the same problem, regardless with or without AHCI/NCQ mode!!!!! I'm not sure anymore that

it's really the drive. It must have to do with the ICH's and drivers.

Share this post


Link to post
Share on other sites

I tested now my old WD3200JD....and....boooom.....same bad result at parallel sequential reads. 4mb/s..... ok. super old drive without TCQ/NCQ features.

BUT. When i use 128KB blocks in parallel seq reads, i get 2* 30-35mb (with some spikes up to 45mb) from my Velocis.

=> So, the drive is very fast for single big-read applications on desktop or suitable for RAID setups on a ICH chipset.

But, nevertheles, it's a bit disappointing result in comparison to a standard desktop drive like the 6400AAKS is.

Maybe, it has to do with my very old P5W64 Bios? Hmmm. But everything else is running fine and rockstable so i don't like to flash a new bios....

Btw: When it comes to scattered heavy reads, like at booting the OS, it's incredible fast. Booting my Win 2008 in 20-25sec to the desktop.

Share this post


Link to post
Share on other sites

But why should WD do this? They have no SAS or SCSI drives to protect.... Sounds not very logical to me.

A test on my 780G plattform in direct comparison to the 6400 would be interesting. too bad thats a huge effort for me because

the drives are not in backplanes and both computers well placed between furniture... Maybe as soon i have more time.

Share this post


Link to post
Share on other sites

I just ran two HDtach short runs on my WD740GD (2nd gen Raptor) without NCQ and I get

HDtach instance 1: 23.3 MB/s, HDtach instance 2: 15.8 MB/s

which looks quite normal for concurrent access. The sound the drive made was very loud.

My 6400AAKS (no NCQ) scores 51.2 MB/s in both instances in run 1 (where head positions were quite similar). In run 2 however (where I started the second benchmark with more delay) I see that the speed dropped to 3-4 MB/s where the drives worked concurrently.

It seems to have to do with how far away the read requests are on the disk.

Share this post


Link to post
Share on other sites

Some new "facts"...

My test procedure now was: Scripted copying of two 850MB files from a single drive to a RAM disk at the same time. I tested with 5 different SATA I/II drives and 3 different SCSI drives, together with 3 different SATA I/II chipsets and one SCSI chipset.

The SATA chipsets were Intel ICH10R (onboard), Marvell 61xx (onboard), Promise TX4 (PCI33) and the SCSI was an Adaptex 19160, all with the latest drivers. The ICH10R I tried in AHCI mode and in IDE mode, and I could duplicate the findings on Windows XP 32bit and Vista 64bit - all on the same machine. The actual measurements are from XP.

Times are rounded to full seconds.

Some explanation about the extra characters written in the two-file test: It was interesting to see how the progress bars moved while copying two files at the same time, because the behaviour was notably different depending on drive AND chipset. So a "p" means the progress bars moved roughly in parallel (together), "s" means the progress bars moved alternately (file 1 - file 2 - file 1 - file 2 - and so on) with about 5-15 changes, and "ss" means that the progress bar moved (except one small step at the beginning) completely for one file and afterwards for the other - so the two files were in fact copied on by the other and not in parallel.

First some results while copying only ONE file:

                         Marvell    Promise    Intel
Samsung Spinpoint 400GB    23         23         23
WD Green 1TB 000ZJB0       25         25         25
WD Green 1TB 000D6B0       29         29         29
WD 640GB 00AAKS-75A7B      20         20         20
WD Velociraptor 000GLFS    15         16         15

                         Adaptec
Maxtor Atlas 15K2          20
Fujitsu Max3036NP          20
Quantum Atlas 10K3         32

And here the results while copying TWO files together:

                         Marvell    Promise    Intel
Samsung Spinpoint 400GB    78 p       84 p      295 p
WD Green 1TB 000ZJB0       44 p       43 p       24 s
WD Green 1TB 000D6B0       29 s      310 p       29 s
WD 640GB 00AAKS-75A7B     442 p      490 p       17 ss
WD Velociraptor 000GLFS    15 p       18 p      133 p

                         Adaptec
Maxtor Atlas 15k2          68 p
Fujitsu Max3036NP          42 p
Quantum Atlas 10k3         38 p

So there are some weird results on more than just the one combination in question, namely ICH10R vs Velociraptor.

For me this turns out to the insight to simply avoid using SATA. I don't care about shiny benchmarks, if real world tests show that getting reliable performance is a lottery. And, by the way, if I use the Marvell-driven ports on my board to avoid the broken ICH10R - Velociraptor setup, I have crashes while using some low-level disk tools like disk imaging backups and drive defragmentation. So I'll soon switch back to SCSI which gave me a hassle-free & top-notch performance on every of my personal workstations for the last nn years.

PS: I'd like to point out that I did all that testing because I swtiched from SCSI to SATA on my new system, thinking it would'nt be worth the extra costs using SCSI for any longer. But from the first moment on I had the impression that my every day tasks now took longer, and - of course - wondered why...

Share this post


Link to post
Share on other sites
For me this turns out to the insight to simply avoid using SATA. I don't care about shiny benchmarks, if real world tests show that getting reliable performance is a lottery.

I'm afraid you are right...

I also think Storage review should use that "2 parallel instances of HD Tach 3" method of testing because it seems to give closer to real world benchmarks.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this