qasdfdsaq

How to achieve superb write speeds with nForce onboard RAID5

Recommended Posts

This is one of those "Oh my god this is so amazing i gotta tell the whole world" posts.

The basic background info is that I, along with many other people, have been settled into the fact that the performance of onboard RAID solutions royally suck, especially when it comes to writing in RAID5 mode. For the past year or so, I've been resigned to pathetic throughputs of barely 10-20MB/sec in typical file-copy operations to my RAID5 array under XP, improving somewhat to 30-40MB/sec under Vista. With various benchmarking sites showing similar figures for this chipset (nForce 570) and it's successor (nForce 6 series), I pretty much assumed that that was just as fast as it went.

Until recently, when I was messing around with copying partitions using a windows port of DD, and as I was tweaking the block sizes a bit for performance, the write throughput suddenly jumped from 20MB/sec up to 220MB/sec. My first reaction was "huh?" followed by "Hmm, probably a bug" and then "Wait a sec..." and subsequently "WTF?!"

Make no mistake, I didn't believe it at first either. For the past year the array had not once exceeded 50MB/sec under the best of conditions, and yet it was actually capable of over 200? An onboard, host-based, consumer-grade RAID5 solution hitting the full write speed of four latest generation hard drives? You're kidding, right?

I suppose I just had to find out... So I then went about ruling out all possible anomalies - caching, bugs, freak errors, the chipset outputting the data to nowhere, etc. None of them panned out. As one test, I copied an entire 300GB partition off a seperate RAID0 array using DD, the entire process completed in less than 30 minutes. Checking the CRC32 sums of both data sets afterwards matched perfectly. So it was true - somehow, this array that had been giving me 20MB/sec in XP for the past year, was really capable of over 200MB/sec writing in RAID-5 mode.

OK, so if the array was really capable of over 220MB/sec, why the hell was I only getting 20MB/sec in XP? Sure, I'd tried tweaking all usual suspects - block size, stripe size, cluster size, all to no avail. So what was DD doing that neither XP or Vista could do? Turns out that only a finely tuned combination of stripe size, I/O block size, partition alignment, and cluster size would yield optimum performance off the RAID array under Windows.

Why? The controller is just too dumb. You'll probably know that in order to write a fractional part of a stripe under RAID5 requires a read => modify => calculate parity => write cycle for the entire block of stripe size multiplied by 4. Issuing the controller with contigous sequential writes with a I/O block size larger than the stripe width negates the requirement for this - as shown by Intel's ICH7 and ICH8 both achieving over 120MB/sec write speeds - but somehow the nForce platform is just too dumb to realise this, and, unless each and every write request is perfectly sized and aligned precisely with the start of a stripe block, it'll issue a full read => modify => parity => write cycle for each and every block repeatedly. Thus killing your performance. Even if you issue a single 8MB I/O request, if it's not aligned, the nForce aint having it.

So for anyone fed up of pathetic write performance on their nVidia nForce based RAID5 arrays, meeting the following requirements should allow you to harness the full speed of your drives under Windows, instead of just a puny fraction of it. In essence, the combination of the following settings forces Windows to lay each file system write request perfectly aligned to the start of a stripe block, and size it to match said block. Note: Only works under NTFS. I have not tested it for FAT32, though it might in theory work as well, if not better.

1) Partition offset. Your partitions on the array must be offset to a common multiple of both the number of drives minus one, and the stripe width. The easest way to do this would be to delete your partitions off the drive (one-by-one works if you cannot do them all at the same time), and repartition using Vista.

XP, and virtually every O/S and partitioning software of XP's day, by default places the first partition on a disk at sector 63. Being an odd number, and 31.5KB into the drive, it isn't ever going to align with any stripe size. This is an unfortunate industry standard.

Vista on the other hand, aligns the first partition on sector 2048 by default as a by-product of it's revisions to support large-sector sized hard drives. As RAID5 arrays in write mode mimick the performance characteristics of large-sector size hard drives, this comes as a great if not inadvertent benefit. 2048 is evenly divisible by 2 and 4 (allowing for 3 and 5 drive arrays optimally) and virtually every stripe size in common use. If you are however using a 4-drive RAID5, you're SOOL.

2) Stripe size. Your stripe width (stripe size x number of drives - 1; ignoring the parity block) must equal or be evenly divisble by your I/O unit size. XP's default I/O block for file read/write operations is 64KB, Vista commonly uses 1-8MB. In order to accomodate this, you should use a 32KB stripe size for a 3-drive RAID5, or a 16KB stripe size for a 5-drive. Again for 4-drives, you're SOOL.

3) I/O block size. Normally you can't alter this, but if you're using low-level async I/O in your apps, self-written applications, or programs that allow you to tune the block size (e.g. many SQL servers), you should set it to write in block sizes equal to or a multiple of your stripe width. Again, XP's default is 64KB.

4) Cluster size. This is how we force Windows to align each write with the start of a stripe block on the array - you will typically need to format your NTFS partition with a cluster size of 64KB. This divides the filesystem down into discrete 64KB blocks, which in combination with a properly aligned partition, positions each allocation unit exactly in line with each stripe block.

The above recommendations, if all met properly, should allow your RAID5 performance under Windows XP or Vista to skyrocket. However, as mentioned, it only works with 3 and 5-drive arrays, with 4 or 6 drives, they will help a bit but probably not a lot (also you will have to position your partitions manually using a hex editor on a different starting sector, e.g. 3072). You won't get decent performance out of a 4 or 6 drive array unless you somehow manage to configure your application to write blocks aligned with and sized appropriately for the relevant stripe size, which you essentially cannot achieve using Windows itself.

Anyway. Article over. Hope this helps some people out there. Any thoughts welcome.

Note: The above information applies to nVidia's nForce 5 and 6 series chipsets only. It may also help with Intel's, but since they already pretty much max out their throughputs without them, it probably won't do much. That and the fact that I don't have an Intel chipset to test with means I can't say.

Share this post


Link to post
Share on other sites

Thanks for the info, just wondering how this would apply to or applied with hardware based RAID controllers, and more importantly non-MS OS's? (in particular Linux, Solaris and *BSDs).

eg setting larger block sizes with ext2fs or UFS2 or FFS?

Share this post


Link to post
Share on other sites

Just been rereading the man page for newfs(8) (for FreeBSD), and it says the default block size is 16K...

So as a general rule:

stripe width = cluster size (or block size) AND

stripe width = stripe size x (drives - 1)

For a FreeBSD system with a default block size of 16K and a 3-drive RAID 5...

16K = stripe size x (3 - 1)

16K / 2 = stripe size

8K = stripe size...

Using the default 16K block size, I would use a 8K stripe. And if I where to increase the UFS2 block size to 64K (and fragment to 8K), then a 32K stripe should be used with a 3-drive RAID 5.

PS. AFAIK FreeBSD uses auto-tuned block I/O sizing, with minimum I/O size being the block size for the partition. And the partitioning tools let you define the start-end sectors, so aligning them to the stripe width should be an easy task...

Share this post


Link to post
Share on other sites
Just been rereading the man page for newfs(8) (for FreeBSD), and it says the default block size is 16K...

So as a general rule:

stripe width = cluster size (or block size) AND

stripe width = stripe size x (drives - 1)

For a FreeBSD system with a default block size of 16K and a 3-drive RAID 5...

16K = stripe size x (3 - 1)

16K / 2 = stripe size

8K = stripe size...

Using the default 16K block size, I would use a 8K stripe. And if I where to increase the UFS2 block size to 64K (and fragment to 8K), then a 32K stripe should be used with a 3-drive RAID 5.

PS. AFAIK FreeBSD uses auto-tuned block I/O sizing, with minimum I/O size being the block size for the partition. And the partitioning tools let you define the start-end sectors, so aligning them to the stripe width should be an easy task...

1MiB chunk size proved to be the fastest for me using raptors

Share this post


Link to post
Share on other sites

With hardware RAID controllers, it shouldn't be much of an issue as any respectable controller should be able to identify a sequential write, plus has sufficient cache memory to handle it properly. That said, aligning all write requests on a stripe boundary still does give performance benefits, though in case of large transfers, it'd only be a marginal improvement.

On your note of a FreeBSD system with a default block size of 16K and a 3-drive RAID 5, I wouldn't generally recommend a stripe size below 32KB, as it tends to hurt performance; larger blocks usually equals better sequential throughput. However, again, due to the peculiarities of the nForce controller I was testing on, everything has to be synced or else the entire system performance comes crashing down - though this tends to ring true with many host and "fake" raid controllers, decent ones usually don't exhibit this behavior.

Share this post


Link to post
Share on other sites

Great information, thanks for posting this. I've got an older Nvidia 6150 northbridge/430 southbridge mainboard at home I was planning to use in a media server with a 3 x 500 gig SATA disk RAID 5 array. I'm hoping to try the method posted above to see if it provides the same benefits in this setup as with the Nvidia 5/6 series chipsets since I believe the slow performance complaints reach back to this chipset as well.

Share this post


Link to post
Share on other sites
Note: The above information applies to nVidia's nForce 5 and 6 series chipsets only. It may also help with Intel's, but since they already pretty much max out their throughputs without them, it probably won't do much. That and the fact that I don't have an Intel chipset to test with means I can't say.

Some nForce5/6 versions have essentially nForce 430 RAID, so the same does apply to nForce 430. I can directly confirm that nForce 430 has this behavior. nForce 430 and Intel RAID behavior in this regard can be seen discussed and simply benched in my posts in this thread:

http://episteme.arstechnica.com/eve/forums...45003306831/p/2

I didn't muck around with partition locations. For me it was enough to create the array in the BIOS.

As can be seen there, Intel (ICH8DO) RAID 5 is much less sensitive to drive arrangement and access sizes than nVIDIA. However, I suspect that in some cases, it also benefits from perfect alignment of number of drives, stripe size and access size (3 drives and 5 drive RAID 5 arrays) -- perhaps especially when the data is written across the network, because the OS might handle such accesses differently. I'd like to test this out in the future, and with ICH9R (probably ICH9DO) as well, which is needed to have > 4-drive RAID 5 arrays under Intel (without going to some server chipsets).

Share this post


Link to post
Share on other sites
XP, and virtually every O/S and partitioning software of XP's day, by default places the first partition on a disk at sector 63. Being an odd number, and 31.5KB into the drive, it isn't ever going to align with any stripe size. This is an unfortunate industry standard.

Vista on the other hand, aligns the first partition on sector 2048 by default as a by-product of it's revisions to support large-sector sized hard drives.

Do you know of any Windows XP or 2003 compatible disk management tools that can partition a drive starting at sector 2048? Whipping up a quick Vista build on a spare drive to create the partition on my RAID5 array is certainly an option but I'm always interested in saving some time when I can.

Share this post


Link to post
Share on other sites

Nothing that I know of unfortunately. All programs I'm aware of align the partition on sector 63, as it's become somewhat of an assumed standard over the past few decades.

If you really wanna try it out, if you can tell me your exact drive size and desired partition(s), I could manually hexedit up a partition table for you.

Also, I believe Norton Partition Magic has a utility called ptedit, which IIRC, allows you to manually enter data values into your partition tables, you might wanna try that. Can't be sure though, it's been years since I managed to get PM to work on my machine.

Share this post


Link to post
Share on other sites

This is a fascinating topic.

Microsoft has most of the details of how to edit the partition table in the following link:

Disk performance may be slower than expected when you use multiple disks in Windows Server 2003, in Windows XP, and in Windows 2000

I am wondering if the partition offset adjustment would have any impact on a RAID 1 system (2 Hard Drives).

SB

Edited by SmallBiz212

Share this post


Link to post
Share on other sites

It probably wouldn't, as RAID1 has no such thing as a stripe size.

However, thanks for that article, it presents a decent solution that can be used under Windows XP without the need for Vista.

Edited by qasdfdsaq

Share this post


Link to post
Share on other sites
It probably wouldn't, as RAID1 has no such thing as a stripe size.

However, thanks for that article, it presents a decent solution that can be used under Windows XP without the need for Vista.

Just wanted to report in that I finally got a chance to set-up my RAID5 array on the Nvidia 430 southbridge RAID controller (6150 northbridge) and my results were consistent with the "before & after" picture presented here. If I went with a 'all defaults' array - partition and format in Windows XP I was able to recreate the same 15-20 MB/s write times. I followed the instructions in this thread and was able to re-create the miraculous improvement in speed on the order of 88 MB/s average (around the same max speed of the slowest drive on it's own) writes using 3 x mismatched drives.

I used the "Vista" method to set up the disk, though I verified with diskpart that the offset value was correct and will use this method within XP going forward if I decide to reconfigure my disk partitions in Windows. Thanks qasdfdsaq and the various other contributors to this thread for the great info!

Now if I could just figure out my god-awful network transfer speeds with the Nvidia Gig-E controller that trump any performance gains in my local RAID5 array when I'm streaming video around my house... :)

Share this post


Link to post
Share on other sites

That's great - I'm glad it worked for you :)

Given the time, and the ability to find a port multiplier, I'm going test this out on a couple of other crappy RAID controllers that only give me about 10-20MB/sec too, especially a few Silicon Image ones. I'll report back.

How bad is the performance on your GigE by the way? I easy achieve ~300mbps on mine, and that's cause my laptop at the other end wont go any higher.

Share this post


Link to post
Share on other sites

I have been trying for a while to get my 3 drive nvidia raid 5 setup to write decently. sequential write performance is at about 15mb/s, compared to standalone drive at 50mb/s.

I believe I fulfilled all the requisites you posted - I set my 'striping block' to 32k (in the nvidia util) I created a partition with an offset of 1mb (my version of diskpart seems to only operate in mb, and wouldn't let me align the partition...) and I formatted the partition with a 64k cluster size.

Can you suggest how I can check all the paramaters? Only one I can get a 'live' reading on is cluster size.

Have I missed something?

Share this post


Link to post
Share on other sites

Hi.

I read this a while ago and put it on my to-do list ;) I've finally come around to it now, but my case is a little different and I was hoping to advice. Firstly, the array is already created and already has data. Also, the partition was made with 4 500GB drives in RAID5, but I've expanded the array to 5 by 500GB, but I haven't touched the partition yet on purpose (hoping I could move the partition into the unused space of the array). I don't have the sort of space or money lying around to buy new drives and backup about 1.5TB, so I'm hoping I can do this partition alignment business without losing data.... can I? I created my array a year+ ago when I didn't know this physical sector offset trick, so its at sector 63 now. I have Partition Magic, which allows me to move the partition, and from what I understand I'd need to move it by one MB but it doesn't allow such fine control, the first jump is already to 7.8MB. If I enter 1.0MB it just automatically drops it to 0. So I think I'm out of luck on that one, unless you have some bright ideas. However, it does come with the PTEDIT32.exe you mentioned. Here's a screenshot:

ptedit.JPG

Can I change Sectors before to 1024 and be done with it? Or do I need to change the starting Cyl Head Sector too? But most importantly, will I lose data?

Thanks so much.

Share this post


Link to post
Share on other sites

perhaps this is a stupid question, but is the partition that's created on the stripe aligned boundary done before or after the raid array is created? most raid controllers will wipe out any partitions that are created before hand when you wish to create an array, so the partition would have to be created afterwards. but when you set the sector for the partition alignment, you're setting it on the "virtual drive" created by the array. now it may be safe to assume to if you set the sector to align your partition on the single "virtual drive" that each drive would have its data aligned on the same sector, but I'm not so sure of that.

Can anyone put to rest my concern? thanks.

Edited by onlinespending

Share this post


Link to post
Share on other sites

This was a very interesting thread to read

Thanks for all the work you have put into it qasdfdsaq!

By pure luck I might have "saved" myself from this trouble then, because I decided to format my RAID5 array in Vista's install program (forcing me to boot from the DVD) and then install it from within Windows XP x64 (thus aborting the install from DVD) to make it preserve the drive letter layout in both operating systems...

This was on a Intel Matrix storage controller built into my mainboard, Asus Maximus Formula...

Share this post


Link to post
Share on other sites

onlinespending, I have also thought how the logical unit (ie array) corresponds to the physical disk(s). However, when the results are a massive improvement as here, it's fairly safe to assume that the alignment is happening on the disks too. The best thing is to experiment. This procedure can yield massive improvements for sure though - on a linux box with a 7-drive RAID0 I got reads up from about 220mb/s all the way to 450mb/s using a combination of alignment (in the end I went without partitions at all - I did not know you could do this in linux!) plus also telling the filesystem how your RAID is setup helps massively - something I don't think you can do under Windows at all. Bottom line is, no matter the OS there is actually a LOT to be gained from tweaking.

Vista automatically creates partitions at an offset of 1MB by the way, so with Vista this is a non-issue.

Share this post


Link to post
Share on other sites
Thanks. Can someone post the disk information for the Vista partition? I'm curious to know all of the correct values if I were to do this by hand using a tool such as PTEDIT32.

Or actually, is there a nice convenient tool to create partitions at arbitrary sector offsets?

The DISKPART tool would appear to do so with the 'align' option, but as it's not allowing me to do so on my WinXP box. I believe I'd need the version of DISKPART that comes with 2k3 Server.

Share this post


Link to post
Share on other sites
Thanks. Can someone post the disk information for the Vista partition? I'm curious to know all of the correct values if I were to do this by hand using a tool such as PTEDIT32.

Or actually, is there a nice convenient tool to create partitions at arbitrary sector offsets?

The DISKPART tool would appear to do so with the 'align' option, but as it's not allowing me to do so on my WinXP box. I believe I'd need the version of DISKPART that comes with 2k3 Server.

Yes, I too failed to create one at an offset using diskpart on xp. If you manage to get a diskpart that works, do let us know please :)

Share this post


Link to post
Share on other sites
Yes, I too failed to create one at an offset using diskpart on xp. If you manage to get a diskpart that works, do let us know please :)

Well I downloaded 2k3 SP2 and extracted the files and expanded diskpart.ex_

Problem is, it doesn't like to run on XP.

"This version of diskpart is not supported on this platform."

Wonder if there's a way to hexedit the file to have it run on XP.

Share this post


Link to post
Share on other sites
Yes, I too failed to create one at an offset using diskpart on xp. If you manage to get a diskpart that works, do let us know please :)

Well I downloaded 2k3 SP2 and extracted the files and expanded diskpart.ex_

Problem is, it doesn't like to run on XP.

"This version of diskpart is not supported on this platform."

Wonder if there's a way to hexedit the file to have it run on XP.

There's also a program called "diskpar.exe" (par, not part) that can do the same thing, but was included in a Windows 2000 Server Resource Kit, and I have not been able to find it to download.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now