241comp

RAID 6 vs RAID 10

Recommended Posts

I am currently building a VMWare server which will run a number (6-12) of VMWare virtual machines such as load distributors, mail servers, DB servers, web servers and GIS mapping servers. I am trying to determine the optimal drive configuration.

The hardware is:

2x2.2Ghz Dual-Core Opteron

4 x 2GB Dual Channel DDR2

3Ware 9650SE-8LP

8 x WD5000YS RE2 HD

Options are:

1.

RAID-1 (2x500GB) for Host/VM

RAID-6 (6x500GB) for Storage

0xHot Spare

2.

RAID-1 (2x500GB) for Host

RAID-10 (4x500GB) for Storage

2xHot Spare

3.

RAID-6 (8x500GB) for Host & Storage

0xHot Spare

4.

RAID-10 (8x500GB) for Host & Storage

0xHot Spare

I'm hesitant to do 1 or 4 because I do not have good drive failure protection (the wrong 2 drives fail, I'm dead - and no hot spare. So, I'm currently between 2 and 3. Both would allow for 2 drive failures (well, provided that the hot spare rebuilds in time in scenario 2). What do you think? What would give me the best performance for a VMWare host machine? Thanks for any tips or if there is documentation on this I should be reading, please feel free to point me to it.

Share this post


Link to post
Share on other sites

4 questions:

1, Why not SCSI? (either traditional 68/80pin or SAS). The higher I/O throughput in Server roles will be noticeable with a SCSI setup, which in turn will mean a better performing server. I can understand cost, but you're already spending some decent $$$.

2, Backup? Tape or nearline HDDs in another enclosure?

3, Which OS? and why VMWare? (If using FreeBSD/Solaris why not uses Jails/Zones for separation, rather than having the full overhead of a VM Session, if using Windows, then disregard this question/comment).

4, How many NICs and what type? What's your LAN Backbone going to be?

And RAID10 will offer higher performance over RAID6 particularly with write operations, which are important with mail and DB servers...

Share this post


Link to post
Share on other sites

First, let me say thanks to you both for responding. Next, I'll try and address a couple of the issues you have brought up...

4 questions:

1, Why not SCSI? (either traditional 68/80pin or SAS). The higher I/O throughput in Server roles will be noticeable with a SCSI setup, which in turn will mean a better performing server. I can understand cost, but you're already spending some decent $$$.

2, Backup? Tape or nearline HDDs in another enclosure?

3, Which OS? and why VMWare? (If using FreeBSD/Solaris why not uses Jails/Zones for separation, rather than having the full overhead of a VM Session, if using Windows, then disregard this question/comment).

4, How many NICs and what type? What's your LAN Backbone going to be?

And RAID10 will offer higher performance over RAID6 particularly with write operations, which are important with mail and DB servers...

1. Not SCSI because my budget for the entire thing (including a W2K3 lic.) is $5,500 per physical server and I need ~1TB of storage space minimum (2TB preferred).

2. Backup is 3-fold. This server will be in a HA-LB configuration with another identical server. Local backup will be nightly incremental to another local server with 500GB of storage space (backing up only locally generated content - most content doesn't need to be backed up as it is not locally generated). Nightly remote backup for the files most sensitive to loss.

3. CentOS will be our host platform and guests will be CentOS, Windows 2003 Server and Debian.

4. Each physical server has 2x1GB nics. One routes to our GB "external" switch and one to the GB "private" switch. Each virtual machine has 2 interfaces, one bridged to each physical interface. This gives each VM up to 1GB access to the public and private network.

I know that RAID-10 is faster than RAID-6 given the same number of drives (though the 9650SE is supposed to be pretty good at RAID-6 - close to RAID-10 speed), however, given my limitation to 6 drives on the storage array with RAID-10 (so that I can have a hot spare) vs 8 drives in a single array with RAID-6, which one will yield better performance? Perhaps the way to go would be 2x500GB RAID-1 for the host drive and for frequently written mounts on each guest such as /var/log and /tmp and 5xRAID-6 for primarily read mounts with 1 drive left as a hot spare for both arrays? This gives a total of 2TB of disk space, ensures double drive failure on a single array is safe (provided the hot spare is brought online between failures) and provides decent write speeds for the majority of writes.

I would also recommend ESX, as GSX (the now-free vmWare Server) performs horribly and has pdflush/cache issues with 2.6 kernels.

Yes, I've run into that in my proof of concept server (2x1800 AthlonMP, 4x250GB RAID-10, 4GB DDR), however, it appears that 2 licenses for ESX would cost ~$5000 (I can't run starter as I need VirtualSMP) so it is currently cost-prohibitive. Not to mention that ESX has additional hardware requirements I'm not sure our server platform would meet...

Again, thanks for the input and all suggestions are welcome.

Share this post


Link to post
Share on other sites
I would also recommend ESX, as GSX (the now-free vmWare Server) performs horribly and has pdflush/cache issues with 2.6 kernels.

Agreed. I can't emphasise how terrible performance is under GSX.

Have you considered RAID5 + Hotspare? You'll probably get better performance.

In fact, why are you so worried about double HDD failure at all with a clustered *and* backed up *and* hot spared machine?

Share this post


Link to post
Share on other sites

I would also recommend ESX, as GSX (the now-free vmWare Server) performs horribly and has pdflush/cache issues with 2.6 kernels.

Yes, I've run into that in my proof of concept server (2x1800 AthlonMP, 4x250GB RAID-10, 4GB DDR), however, it appears that 2 licenses for ESX would cost ~$5000 (I can't run starter as I need VirtualSMP) so it is currently cost-prohibitive. Not to mention that ESX has additional hardware requirements I'm not sure our server platform would meet...

Again, thanks for the input and all suggestions are welcome.

As long as you're "OK" with randomly rebooting and failing servers, recovering VMDKs, losing data, failed time synchronization, and the entire crap-sandwich that is GSX. Have you investigated other alternatives ?

Frank

Share this post


Link to post
Share on other sites

241comp, I assume that this is some sort of test / staging server, or it's a production server with very low actual load? It's just that available disk I/O will be quite low when you're using SATA drives with virtualization on top.

Why not 7 disks in RAID 5 with one hot spare available (and perhaps multiple partitions on the array)? Having a hot spare is IMHO important.

Share this post


Link to post
Share on other sites
Agreed. I can't emphasise how terrible performance is under GSX.

Have you considered RAID5 + Hotspare? You'll probably get better performance.

In fact, why are you so worried about double HDD failure at all with a clustered *and* backed up *and* hot spared machine?

I guess I'm just worried because I'm paranoid - bad past experiences with RAID arrays. Mostly, I'm concerned that we will have 1 drive failure and then another drive will experience data loss during rebuild. Maybe I should just get some counseling and go with RAID-10.

As long as you're "OK" with randomly rebooting and failing servers, recovering VMDKs, losing data, failed time synchronization, and the entire crap-sandwich that is GSX. Have you investigated other alternatives ?

Interesting - we've been running in-house "production" servers on the proof of concept server for about 8 months and haven't run into any of those problems... do they only show up under heavy load? We have more than 1 virtual machine that has been up for 90+ days currently with no issues. Unfortunately, ESX requires SCSI which we are not planning to run so even if someone donated the ESX licenses, I think we are out of luck. Point taken, though, and I admit that you may find me back on here in 6 months asking what configuration to use with my new SCSI RAID Array for ESX.

241comp, I assume that this is some sort of test / staging server, or it's a production server with very low actual load? It's just that available disk I/O will be quite low when you're using SATA drives with virtualization on top.

Why not 7 disks in RAID 5 with one hot spare available (and perhaps multiple partitions on the array)? Having a hot spare is IMHO important.

It is a production server with relatively light load but large storage requirements. In fact, we are oversizing the processing capabilities intentionally to ensure it has a light load. Primarily we are serving a few specialized GIS websites which have large data requirements but low traffic.

3Ware claims the 9650SE is capable of sustaining 700+MB/s reads and 600+MB/s writes to RAID-6 arrays (12 drives). Their internal benchmarks show the card performing almost identically between RAID-5 and RAID-6 (less than 10% difference on write - identical read). Given that all of our GIS data is indexed, the reads will be short and "random" instead of sustained and linear, though. All the advice is encouraging me to lean toward 6x500GB RAID-10 with 2xHot Spare or 7x500GB RAID-5 with 1xHot Spare.

Share this post


Link to post
Share on other sites

To crash a GSX host, create 5 guests and simultaneously tar tgz a 1GB file on each of them. I can crash one of our UAT environments is under a minute on RHEL4. RHEL3 hosts don't have this issue, but finding modern server hardware support in the 2.4 kernel is extremely tricky. VMware is aware of the bug, the kernel maintainers are aware of the bug. Each points fingers at each other. It hasn't been fixed in the two years that we've been reporting it. We currently have invested half a mil in GSX licensing and are less than happy.

Thank you for your time,

Frank Russo

Share this post


Link to post
Share on other sites
To crash a GSX host, create 5 guests and simultaneously tar tgz a 1GB file on each of them. I can crash one of our UAT environments is under a minute on RHEL4. RHEL3 hosts don't have this issue, but finding modern server hardware support in the 2.4 kernel is extremely tricky. VMware is aware of the bug, the kernel maintainers are aware of the bug. Each points fingers at each other. It hasn't been fixed in the two years that we've been reporting it. We currently have invested half a mil in GSX licensing and are less than happy.

Thank you for your time,

Frank Russo

I'll have to give that a try. I hope they fix that bug soon for you and the rest of us stuck with GSX.

Share this post


Link to post
Share on other sites

I thought I should post back what I decided to do. I've gone with 2xRAID-1 and 4xRAID-10 with a hot spare. This was because I checked the read/write percentages on our existing GIS server and found that writes are >50% and based on the xbitlabs review of the 9500S (closest to the 9650SE that they've reviewed), at 40/60 split, 4xRAID-10 outperforms even 6xRAID-5 by 30% or more. Maybe the 9650SE has improved on that somewhat but probably not enough to justify going with RAID-5.

Share this post


Link to post
Share on other sites

Just a little note -- RAID 6, at least logically, has a different advantage over RAID 10, RAID 5, etc. -- better ability to keep the data consistent. I don't know the cause; perhaps cheap hardware or something, but I've seen data inconsistencies with RAID 5. So a RAID 5 system detects an inconsistency and rebuilds it. How exactly does it know which drive has wrong data? It'll just guess that the parity is wrong and recalculate it, but if the parity can be wrong, isn't is more likely that one of the several other drives is wrong?

I've seen a bunch of files change at the data level when the array decided to do a rebuild because of a detected inconsistency. I've also seen changes when a drive failed and was replaced. RAID 6 has a greater chance of addressing such issues.

Share this post


Link to post
Share on other sites
How exactly does it know which drive has wrong data?

Easy, it's the drive that fails the CRC check. If the controller actually wrote bad data, that's another story. The solution to controllers that write bad data is the trash bin.

It is 'almost' impossible for a drive to read data differently than what was written to it and not notice.

Frank

Share this post


Link to post
Share on other sites

Sorry, came to this one a bit late. However I can confirm that you've made the right choice. I run a number of VMWare hosts and RAID5/6 is a crippling bottleneck. You need plenty of write i/o's, disk redundancy and space - ie. RAID10.

Share this post


Link to post
Share on other sites
Sorry, came to this one a bit late. However I can confirm that you've made the right choice. I run a number of VMWare hosts and RAID5/6 is a crippling bottleneck. You need plenty of write i/o's, disk redundancy and space - ie. RAID10.

Thanks, I appreciate your input anyway do you mind if I ask a couple questions about performance tuning VMWare hosts? Currently, I have the host OS booting from a separate RAID-1 array and I have placed the swap/tmp virtual drives on the RAID-1 array with the main virtual drives on the RAID-10 array. Do you think that is a wise setup or should I have all virtual drives on the same array? The idea was that /tmp and swap are "frequently" written (though hopefully swap isn't) and I wanted to prevent contention between writing to /tmp for VM 1 and reading files for VM2, VM3, VM4, etc. What do you think? One last item - do you have recommended drive tuning settings (such as readahead) for VMWare or should I just go with whatever benchmarks (hdparm) say gives me the best "performance"?

Share this post


Link to post
Share on other sites
To crash a GSX host, create 5 guests and simultaneously tar tgz a 1GB file on each of them. I can crash one of our UAT environments is under a minute on RHEL4. RHEL3 hosts don't have this issue, but finding modern server hardware support in the 2.4 kernel is extremely tricky. VMware is aware of the bug, the kernel maintainers are aware of the bug. Each points fingers at each other. It hasn't been fixed in the two years that we've been reporting it. We currently have invested half a mil in GSX licensing and are less than happy.

Thank you for your time,

Frank Russo

Well, they must have fixed this or my configuration isn't vulnerable because I've been running this test on RHEL4 and I can't get it to so much as hiccup.

Share this post


Link to post
Share on other sites

Check all of the latest optimizing VMware ESX whitepapers on the VMware site, as well as many VMware books.. All recommend 15K drives that are either SCSI or SAS, and RAID10 configs for high i/o.

A good book to pick up is "VMware ESX Server, Advanced Technical Design Guide." By, Ron Oglesby and Scott Herold. Quite possibly one of the better written books out there today on ESX server.

Share this post


Link to post
Share on other sites
I thought I should post back what I decided to do. I've gone with 2xRAID-1 and 4xRAID-10 with a hot spare. This was because I checked the read/write percentages on our existing GIS server and found that writes are >50% and based on the xbitlabs review of the 9500S (closest to the 9650SE that they've reviewed), at 40/60 split, 4xRAID-10 outperforms even 6xRAID-5 by 30% or more. Maybe the 9650SE has improved on that somewhat but probably not enough to justify going with RAID-5.

I think, new generation controllers with IOP341 will do their job better than IOP333 and 9650's processor. In this case, RAID 5 or, better, RAID 6 will give you more storage and good performance. My previous build was with Areca 1220 (6x WD RE2 500GB RAID 6), but I decided to get something faster and keep RAID 6 instead of migrating to RAID 10 for better performance for my file server. 12-port ARc-1231ML is about $900, but will keep your server up for years ;)

Share this post


Link to post
Share on other sites

I think you guys are failing to see his budget of $5,500. ESX is more then $5,500 for the software alone. We use VMware Server on Server 2003 and love it. We have yet to have a failure of any sort and we have been runing it for 6 months+. For best performance, put one to two machines per array. Dont make one big array because the seek times will kill all performance. We have 9 servers running on an almost identical setup as you. We have 6 arrays, two 400GB SATA drives per array. It runs everything just fine.

Share this post


Link to post
Share on other sites
...based on the xbitlabs review of the 9500S (closest to the 9650SE that they've reviewed).. Maybe the 9650SE has improved on that somewhat but probably not enough to justify going with RAID-5.
"Assumption is the mother of all happy fellow ups."

Aspecially in anything associated with computers.

Looks like 9500S wasn't really meant for RAID5:

http://www.3ware.com/products/what-diff.asp

And 9650SE isn't some PCIe mod for 95xx serie but new model with RAID6 included.

http://www.3ware.com/products/pdf_9650/965..._transition.pdf

So I think there should be clear difference.

Share this post


Link to post
Share on other sites

If this guy is truly on a budget and can't foot for the real deal..

Create a smallish 30-80GB OS host drive RAID 1

Create two RAID 10 volumes using 15K raptors. The more spindles the better.

The notion of using high capacity drives is not the recommended configuration both from VMware themselves and Enterprise best practices. You want smaller drives and more spindles to spread the i/o load

Share this post


Link to post
Share on other sites
I think you guys are failing to see his budget of $5,500. ESX is more then $5,500 for the software alone. We use VMware Server on Server 2003 and love it. We have yet to have a failure of any sort and we have been runing it for 6 months+. For best performance, put one to two machines per array. Dont make one big array because the seek times will kill all performance. We have 9 servers running on an almost identical setup as you. We have 6 arrays, two 400GB SATA drives per array. It runs everything just fine.

I use Vista Ux64 with VMWare Server and also love it. ;) Even though it is not in the list of recommended host OSes VM Server runs much better than it used to under XP Pro.

I'd also recommend to create as many arrays as possible and dedicate one (two max) per array to get best response/performance for each VM.

Share this post


Link to post
Share on other sites

I think you guys are failing to see his budget of $5,500. ESX is more then $5,500 for the software alone. We use VMware Server on Server 2003 and love it. We have yet to have a failure of any sort and we have been runing it for 6 months+. For best performance, put one to two machines per array. Dont make one big array because the seek times will kill all performance. We have 9 servers running on an almost identical setup as you. We have 6 arrays, two 400GB SATA drives per array. It runs everything just fine.

I use Vista Ux64 with VMWare Server and also love it. ;) Even though it is not in the list of recommended host OSes VM Server runs much better than it used to under XP Pro.

I'd also recommend to create as many arrays as possible and dedicate one (two max) per array to get best response/performance for each VM.

If only you could use RAID1E config (3 drives per array), you could get away with RAID1 for host OS and 2 RAID1E arrays for your VMs.

Edited by Jus

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now