This is one of those "Oh my god this is so amazing i gotta tell the whole world" posts.
The basic background info is that I, along with many other people, have been settled into the fact that the performance of onboard RAID solutions royally suck, especially when it comes to writing in RAID5 mode. For the past year or so, I've been resigned to pathetic throughputs of barely 10-20MB/sec in typical file-copy operations to my RAID5 array under XP, improving somewhat to 30-40MB/sec under Vista. With various benchmarking sites showing similar figures for this chipset (nForce 570) and it's successor (nForce 6 series), I pretty much assumed that that was just as fast as it went.
Until recently, when I was messing around with copying partitions using a windows port of DD, and as I was tweaking the block sizes a bit for performance, the write throughput suddenly jumped from 20MB/sec up to 220MB/sec. My first reaction was "huh?" followed by "Hmm, probably a bug" and then "Wait a sec..." and subsequently "WTF?!"
Make no mistake, I didn't believe it at first either. For the past year the array had not once exceeded 50MB/sec under the best of conditions, and yet it was actually capable of over 200? An onboard, host-based, consumer-grade RAID5 solution hitting the full write speed of four latest generation hard drives? You're kidding, right?
I suppose I just had to find out... So I then went about ruling out all possible anomalies - caching, bugs, freak errors, the chipset outputting the data to nowhere, etc. None of them panned out. As one test, I copied an entire 300GB partition off a seperate RAID0 array using DD, the entire process completed in less than 30 minutes. Checking the CRC32 sums of both data sets afterwards matched perfectly. So it was true - somehow, this array that had been giving me 20MB/sec in XP for the past year, was really capable of over 200MB/sec writing in RAID-5 mode.
OK, so if the array was really capable of over 220MB/sec, why the hell was I only getting 20MB/sec in XP? Sure, I'd tried tweaking all usual suspects - block size, stripe size, cluster size, all to no avail. So what was DD doing that neither XP or Vista could do? Turns out that only a finely tuned combination of stripe size, I/O block size, partition alignment, and cluster size would yield optimum performance off the RAID array under Windows.
Why? The controller is just too dumb. You'll probably know that in order to write a fractional part of a stripe under RAID5 requires a read => modify => calculate parity => write cycle for the entire block of stripe size multiplied by 4. Issuing the controller with contigous sequential writes with a I/O block size larger than the stripe width negates the requirement for this - as shown by Intel's ICH7 and ICH8 both achieving over 120MB/sec write speeds - but somehow the nForce platform is just too dumb to realise this, and, unless each and every write request is perfectly sized and aligned precisely with the start of a stripe block, it'll issue a full read => modify => parity => write cycle for each and every block repeatedly. Thus killing your performance. Even if you issue a single 8MB I/O request, if it's not aligned, the nForce aint having it.
So for anyone fed up of pathetic write performance on their nVidia nForce based RAID5 arrays, meeting the following requirements should allow you to harness the full speed of your drives under Windows, instead of just a puny fraction of it. In essence, the combination of the following settings forces Windows to lay each file system write request perfectly aligned to the start of a stripe block, and size it to match said block. Note: Only works under NTFS. I have not tested it for FAT32, though it might in theory work as well, if not better.
1) Partition offset. Your partitions on the array must be offset to a common multiple of both the number of drives minus one, and the stripe width. The easest way to do this would be to delete your partitions off the drive (one-by-one works if you cannot do them all at the same time), and repartition using Vista.
XP, and virtually every O/S and partitioning software of XP's day, by default places the first partition on a disk at sector 63. Being an odd number, and 31.5KB into the drive, it isn't ever going to align with any stripe size. This is an unfortunate industry standard.
Vista on the other hand, aligns the first partition on sector 2048 by default as a by-product of it's revisions to support large-sector sized hard drives. As RAID5 arrays in write mode mimick the performance characteristics of large-sector size hard drives, this comes as a great if not inadvertent benefit. 2048 is evenly divisible by 2 and 4 (allowing for 3 and 5 drive arrays optimally) and virtually every stripe size in common use. If you are however using a 4-drive RAID5, you're SOOL.
2) Stripe size. Your stripe width (stripe size x number of drives - 1; ignoring the parity block) must equal or be evenly divisble by your I/O unit size. XP's default I/O block for file read/write operations is 64KB, Vista commonly uses 1-8MB. In order to accomodate this, you should use a 32KB stripe size for a 3-drive RAID5, or a 16KB stripe size for a 5-drive. Again for 4-drives, you're SOOL.
3) I/O block size. Normally you can't alter this, but if you're using low-level async I/O in your apps, self-written applications, or programs that allow you to tune the block size (e.g. many SQL servers), you should set it to write in block sizes equal to or a multiple of your stripe width. Again, XP's default is 64KB.
4) Cluster size. This is how we force Windows to align each write with the start of a stripe block on the array - you will typically need to format your NTFS partition with a cluster size of 64KB. This divides the filesystem down into discrete 64KB blocks, which in combination with a properly aligned partition, positions each allocation unit exactly in line with each stripe block.
The above recommendations, if all met properly, should allow your RAID5 performance under Windows XP or Vista to skyrocket. However, as mentioned, it only works with 3 and 5-drive arrays, with 4 or 6 drives, they will help a bit but probably not a lot (also you will have to position your partitions manually using a hex editor on a different starting sector, e.g. 3072). You won't get decent performance out of a 4 or 6 drive array unless you somehow manage to configure your application to write blocks aligned with and sized appropriately for the relevant stripe size, which you essentially cannot achieve using Windows itself.
Anyway. Article over. Hope this helps some people out there. Any thoughts welcome.
Note: The above information applies to nVidia's nForce 5 and 6 series chipsets only. It may also help with Intel's, but since they already pretty much max out their throughputs without them, it probably won't do much. That and the fact that I don't have an Intel chipset to test with means I can't say.