Sign in to follow this  
wasserkool

Is it worth it to get an expensive Raid 5 Controller or opt for a fast

Recommended Posts

I have a delimma here. An 8 port Areca card cost around $500. For less money, I can get an good mobo + AMD X2 3800 for my file server.

Now if I get an fast CPU but a non-hardware RAID 5 card, will I still get good performance or is it that the IO processor on the RAID card will outperform the fastest CPU?

Share this post


Link to post
Share on other sites

Depends on things such as the OS that you're using. Areca's cards are very nice. If you want to do away with RAID-5 and you have 3 disks, you could get a lower speced card, add a hard drive (so you have a total of 4 hard drives) and then do a RAID 10 on them (which doesn't require a lot of computation).

Share this post


Link to post
Share on other sites

Here's a second vote for RAID 1+0 over RAID 5.

RAID 5 is fine for maximizing your storage space, but for small volumes (6 or 8 drives), the cost of extra drives to run in RAID 1+0 usually cost less than a decent RAID 5 card, and the RAID 1+0 normally has higher performance.

Share this post


Link to post
Share on other sites

yep - at our DC most people love raid 10 but we had to use raid 5 because there are limits to 4 disks per machine and we needed as much space as possible.

Daniel

Share this post


Link to post
Share on other sites

If you're using RAID 1+0, you really really don't want to have more than 4 drives unless your backups are instantly up to date. You increase your probability of a drive failure with every drive you add, not to mention the extra power requirements and heat generation.

Also, while 1+0 will pretty much always beat a cheap RAID 5, an Areca with 256MB onboard cache and dedicated processor is going to be hard to beat with some cheapo 1+0 solution. And with that card running RAID 6, your reliability is much higher, plus you have the option of a hot spare.

Cheap <--> Reliable <--> Fast

Pickt two.

Share this post


Link to post
Share on other sites

allow me to jump in here. My Poweredge 2600 is loaded with 6 drives. First i used Raid5, i switched to Raid10, BUT no real gain in speed here.

Afterall even the PERC4 works sequential.

But, i managed to get 2 drives killed. As you know if you have 6 drives you can allow yourself to loose 3 if not in the same Raid1.

I ordered 5 drives in Germany, because i needed 18 gigs and did not want to buy 36 gigs, loosing 18 gigs in the Raid10.

Only one of the 5 second hand drives works. Guess what 30 minutes after installing that one good drive the one above the "new" installed failed. I almost got a heart attack.

That was it for me. I have 2 old Quantums 10 k's (the six where 15K's) and installed those in order to get my RAID10 back in shape.

Although those old Quantum's are only 10K's the Raid is restored.

So i would say: go for the RAID10, but only for protection, not speed.

I will never, ever go to RAID5 again in my server environment.

The only problem that can occur for anyone.....you loose half of your gig's and Scsi, unfortunately are not for size like Ide drives, ATA, SATA, whatever.

Jeff

Share this post


Link to post
Share on other sites

RAID 10 is faster than RAID 5 especially in writes. In reads, it might not be a whole lot faster. You'll really notice the difference if you're performing writes that are greater than the cache on the controller.

Share this post


Link to post
Share on other sites

RAID 0+1 is faster than anything else, but it's level of reliablity tanks when the number of drives increase compared to RAID 6.

Some calculations about reliability are availabile here. (There was a recent study about this that I can't seem to track down.)

If you don't need super high speed writes, then go RAID 6. If you need fairly high speed writes, get a RAID 6 card with a decent cache. If speed is your primary concern, you aren't using more than 4 drives, and your backups are always current, go RAID 0+1.

Share this post


Link to post
Share on other sites

in my opinion Raid 01 is not the same as Raid10. Correct me if wrong.

Anyway, my Raid 10 is not NOTICABLE faster then my previous Raid5.

6 15k's Ultra320 and a PerC4 card.

When you put in the disk(s) that come with the PE2600 (you know with explanations, setups....it is clearly said "read/writes" happen sequentially, so do not think that the (in my case) 3 drives in the Raid0 get attacked all at the same time, they get there actions one by one. So?

I am not so in the tests giving you wonderful figures. It is the daily use i am interested in. So again, others may experience this differently, but, i am only using Raid10 for protection.

Jeff

Share this post


Link to post
Share on other sites
RAID 0+1 is faster than anything else, but it's level of reliablity tanks when the number of drives increase compared to RAID 6.

Some calculations about reliability are availabile here. (There was a recent study about this that I can't seem to track down.)

If you don't need super high speed writes, then go RAID 6. If you need fairly high speed writes, get a RAID 6 card with a decent cache. If speed is your primary concern, you aren't using more than 4 drives, and your backups are always current, go RAID 0+1.

Why are you suggesting that if you want to use RAID 1+0 that it is a requirement that you don't use more than 4 drives? That's not correct at all. The only requirement is that RAID 1+0 requires a minimum of 4 drives, and that drives must be added in pairs.

It seems to me that you are misconstruing the research that was done regarding reliability. The direct comparison was between RAID 5 and RAID 6 (which both perform XOR operations). RAID 1+0 does not perform XOR operations (they are not needed for 1+0). This alone means that there are far fewer I/O operations required for "normal" operation.

Also, RAID 0+1 is NOT the same as RAID 1+0. For "nested" RAID levels, the standard nomenclature is that the RAID levels are listed in the order they are applied. RAID 1+0 is a "stripe of mirrors", whereas 0+1 is a "mirror of a stripe". The difference is highly significant.

In addition, RAID 1+0 has higher reliability than either RAID 5 or 6 (I will be happy to illustrate this if you'd like). This comes at a cost, since you must purchase more drives to achieve the same capacity than RAID 5 or 6 (assuming that both arrays are built using the same size hard drives). The upside is much higher reliability, and a much higher throughput for writes. Read speed is approximately the same, until you actually have to rebuild your array (when a drive failure occurs). At that point, RAID 5 or 6 lose quite a bit of steam because of the extra XOR calculations that are required to reconstruct the data "on the fly" (or write the data to the replacement drive(s)). RAID 1+0 suffers significantly less during a rebuild, because it's easy to "mirror" one hard drive to another (it's just a data copy).

Share this post


Link to post
Share on other sites
in my opinion Raid 01 is not the same as Raid10. Correct me if wrong.

Anyway, my Raid 10 is not NOTICABLE faster then my previous Raid5.

6 15k's Ultra320 and a PerC4 card.

When you put in the disk(s) that come with the PE2600 (you know with explanations, setups....it is clearly said "read/writes" happen sequentially, so do not think that the (in my case) 3 drives in the Raid0 get attacked all at the same time, they get there actions one by one. So?

I am not so in the tests giving you wonderful figures. It is the daily use i am interested in. So again, others may experience this differently, but, i am only using Raid10 for protection.

Jeff

Jeff, if you check around, I think you'll find that Perc cards have a "reputation" for not being especially great performers. At least, I've noticed quite a few posts denigrating their performance. I, personally, have never used one so I can't comment from direct experience.

Share this post


Link to post
Share on other sites

True Trinary, but why use Raid 0,1 when you can use Raid1,0 and be more safer.

Although it might be true that PerC4 Raid controllers are not fast (actually it are LSI Raid controllers), still the fact remains that i will never ever go back to Raid5, unless i have only 3 drives.

I will look the Raid6 thing up, that is interesting me alot.

Jeff

Share this post


Link to post
Share on other sites

So i looked at Raid 6. I must say "what is new"? You have better protection then with Raid 5, but still Raid 10 gives you the best protection.

Trinary, i never said that one should only use 4 drives.

If ever you want your Raid 5 to be faster, you need at least 8-12 drives.

What we all could use is more Raid cache of course. A few gig would do.

Jeff

Share this post


Link to post
Share on other sites

Actually a RAID10 is not always more reliable than a RAID6 array. Since a RAID6 array must have 3 concurrent failures before loss of data while a RAID10 array must have a pair fail.

So i looked at Raid 6. I must say "what is new"? You have better protection then with Raid 5, but still Raid 10 gives you the best protection.

Trinary, i never said that one should only use 4 drives.

If ever you want your Raid 5 to be faster, you need at least 8-12 drives.

What we all could use is more Raid cache of course. A few gig would do.

Jeff

Share this post


Link to post
Share on other sites
Actually a RAID10 is not always more reliable than a RAID6 array. Since a RAID6 array must have 3 concurrent failures before loss of data while a RAID10 array must have a pair fail.

So i looked at Raid 6. I must say "what is new"? You have better protection then with Raid 5, but still Raid 10 gives you the best protection.

Trinary, i never said that one should only use 4 drives.

If ever you want your Raid 5 to be faster, you need at least 8-12 drives.

What we all could use is more Raid cache of course. A few gig would do.

Jeff

kdkirmse, you are referring to the mimum of failures to render the array nonviable. If you have both drives in a mirror pair in RAID 1+0 fail, then yes, the array is no longer accessible. With RAID 6, it requires 3 concurrent failures to render the array nonviable.

However, you are either not considering or downplaying that with RAID 6, any three concurrent drive failures will have this effect, whereas with RAID 1+0, you must have both drives in a mirror pair fail.

Looking at it strictly from a numbers viewpoint, the odds of the 2nd drive failure happening to the mirror of a failed drive before the failed drive is rebuilt decrease as more drives are added to the array.

For example, consider a 6 drive RAID 6 array and a 6 drive RAID 1+0 array. Any 3 concurrent failures in the RAID 6 array will destroy the array. The RAID 1+0 array could theoretically survive up to 3 failures and still be operational.

Starting from the point of 1 drive failure, the odds of a second concurrent failure taking down the array are 1 in 5 (20%), since there are 5 operating drives in the array after the first failure, and only 1 of those (the mirror of the failed drive) would actually take down the array if it failed. For the 3rd concurrent failure, the odds of this increase to 1 in 4 (25%) since there are still 4 operating drives in the array.

Of course, the flip side is also true. It is 80% likely that a 6 drive RAID 1+0 array will be operating after the 2nd drive failure, and 75% likely that it will still be working after the 3rd drive failure. It is 100% certain that a 6 drive RAID 6 array will be operating after the first two concurrent failures, but it is also 100% certain that the RAID 6 array will not be working when the third concurrent failure occurs.

Naturally, these numbers can be easily improved, because a 6 drive RAID 1+0 array actually doesn't have that many drives in it. If you increase the number of drives to 10, 12, or more, then as you add drives the RAID 1+0 array gains reliability.

With 10 drives in RAID 1+0, the odds of surviving the 2nd failure are 8 in 9 (~88.8%), and the odds of surviving a 3rd failure are 7 in 8 (87.5%), and it is possible to survive as many as 5 concurrent failures.

Of course, this is not considering the possible causes of multiple drive failure, which might make concurrent failures more or less likely (bad batch of drives, for example), but those apply equally regardless of RAID level used, so I think it's safe to not consider them when just taking about comparing reliability.

Another thing to consider is the "window of vulnerability" (time to rebuild a failed drive onto an available hot spare before another drive fails). RAID 6 rebuild times are significantly longer than RAID 1+0, because of all the XOR calculations and multiple disk reads that must occur. RAID 1+0 just has to copy data directly from one drive to another with practically no overhead, so the "window of vulnerability" is considerably shorter. This further reduces the odds of "concurrent" failure in RAID 1+0.

With all these things considered on an even playing field, RAID 1+0 is generally acknowledged to have superior reliability, better performance during rebuilds, and normal read performance either on par or exceeding any other RAID level.

The only downside is that you pay more because of the extra drives involved when compared to a given capacity in RAID 5 or RAID 6.

However, good RAID 5 or RAID 6 controllers (and the cache to make them perform well!) cost quite a bit, so, up to around 10 or 12 drives, the cost of a quality controller and the cache for it might be more than the cost of a quality RAID 1+0 controller and the extra drives for a RAID 1+0 array to have the same capacity as a RAID 5 or RAID 6 array, with all the extra benefits of RAID 1+0 compared to RAID 5 or RAID 6.

So, if performance is your number one consideration when considering RAID levels, then I definitely think that RAID 1+0 is the way to go.

Edited by Trinary

Share this post


Link to post
Share on other sites
Starting from the point of 1 drive failure, the odds of a second concurrent failure taking down the array are 1 in 5 (20%), since there are 5 operating drives in the array after the first failure, and only 1 of those (the mirror of the failed drive) would actually take down the array if it failed. For the 3rd concurrent failure, the odds of this increase to 1 in 4 (25%) since there are still 4 operating drives in the array.

Just a little fix here... :)

If a 6 drive raid 1+0 array has 2 dead drives and is still functional, i figure 2 of the 4 functional drives would have to be paired with a dead drive (and the last 2 with each other). So 3rd failure would have 2/4 = 50% chance to take down the array.

Share this post


Link to post
Share on other sites

You have a few errors in your post. First, a mathematical error, and second a logical error.

Starting from the point of 1 drive failure, the odds of a second concurrent failure taking down the array are 1 in 5 (20%), since there are 5 operating drives in the array after the first failure, and only 1 of those (the mirror of the failed drive) would actually take down the array if it failed. For the 3rd concurrent failure, the odds of this increase to 1 in 4 (25%) since there are still 4 operating drives in the array.

That is not correct. A 6-drive RAID 10 has 3 pairs for a total of 6 drives. After one drive failure you have 2 pairs and a singleton. A second failure has, as you said, a 20% chance of hitting that singleton. Assuming it does not, a third drive failure has a 50% chance of killing the array (two singletons and one pair of drives remain; either singleton's failure will take out the array). So the probability of arbitrary drive failures resulting in array failure for a 6-drive RAID 10 is as follows:

0% 1 failure

20% 2 failures

60% 3 failures

100% 4+ failures

The same probabilities for a 6-drive RAID 6 array are:

0% 1 failure

0% 2 failures

100% 3+ failures

Another thing to consider is the "window of vulnerability" (time to rebuild a failed drive onto an available hot spare before another drive fails). RAID 6 rebuild times are significantly longer than RAID 1+0, because of all the XOR calculations and multiple disk reads that must occur. RAID 1+0 just has to copy data directly from one drive to another with practically no overhead, so the "window of vulnerability" is considerably shorter. This further reduces the odds of "concurrent" failure in RAID 1+0.

The rebuild time for a RAID 10 array is not more than the rebuild time for a RAID 5, though in the best case both have the same rebuild time, dependent on the spare disk's sustained write speed. You compare the rebuild times for both RAID types under the condition that an additional drive failure may cause array failure, yet this is not a fair comparison. After a single drive failure, a RAID 10 array is vulnerable to loss if a second disk dies before the degraded mirror is rebuilt. In contrast, a RAID 6 array which has lost a drive is not vulnerable and retains redundancy while the rebuild is in progress. Comparing the rebuild times of both RAID levels without acknowledging or taking into account their different failure modes is at best in error and at worst dishonest.

Share this post


Link to post
Share on other sites

wasserkool, is this a home file server? One used by less than several tens of people?

If so, why not just get a bunch of cheap SATA controllers and use software RAID5? Software works fine in Linux and FreeBSD, and if you have the server version of Windows (which I think is quite expensive and pointless for a file server), it supports soft5 as well.

If you are using a non-server Windows, you still have other RAID choices, or you can use the driver-based RAID now present in a lot of cards and onboard controllers.

If you already have a CPU and motherboard for the fileserver, why spend the money on faster stuff? A home file server does not require much in the way of CPU time, even using software RAID. If you really want to spend money, buy more drive space.

I have a home file server and I went through the "spend money because I want cool server hardware" stage, but soon the novelty wears off and it comes down to "what do I actually need?" "What do I actually use this thing for?" "Do I really want to spend $500 on a damn server more than on something with social or family value?"

Of course, this is coming from someone who just bought an 8-port SAS RAID card and $100 in cables for my own fileserver. Call me a hypocrite.

Speaking of which, I have no RAID on said server at all. Important data is backed up to hard drive with rsync and onto tape. Mass-media isn't backed up... Maybe not the best way, but everything not backed up is replaceable.

Share this post


Link to post
Share on other sites

Actually a RAID10 is not always more reliable than a RAID6 array. Since a RAID6 array must have 3 concurrent failures before loss of data while a RAID10 array must have a pair fail.

So i looked at Raid 6. I must say "what is new"? You have better protection then with Raid 5, but still Raid 10 gives you the best protection.

Trinary, i never said that one should only use 4 drives.

If ever you want your Raid 5 to be faster, you need at least 8-12 drives.

What we all could use is more Raid cache of course. A few gig would do.

Jeff

kdkirmse, you are referring to the mimum of failures to render the array nonviable. If you have both drives in a mirror pair in RAID 1+0 fail, then yes, the array is no longer accessible. With RAID 6, it requires 3 concurrent failures to render the array nonviable.

However, you are either not considering or downplaying that with RAID 6, any three concurrent drive failures will have this effect, whereas with RAID 1+0, you must have both drives in a mirror pair fail.

Looking at it strictly from a numbers viewpoint, the odds of the 2nd drive failure happening to the mirror of a failed drive before the failed drive is rebuilt decrease as more drives are added to the array.

Once you have a failure in a RAID10 array the reliability of the array drops to the reliability of the 2nd drive in the degraded pair. In the case of a RAID6 array with a failed drive the reliability of the array drops to that of a RAID5 array.

a first order calculation for the MTBF of a RAID10 array

MTBF^2 / (2 * N * MTTR10)

MTBF mean time between failures

N number of pairs in the array

MTTR10 mean time to repair RAID10

a first order calculation for the MTBF of a RAID 6 array

MTBF^3 / (D * (D - 1) * (D - 2) * MTTR6^2)

MTBF mean time between failures

D number of drives in the array

MTTR6 mean time to repair RAID6

some normalization of terms

MTTR6 = K * MTTR10

N = D - 2

Calculate crossover point where RAID10 becomes more reliable than RAID6

MTBF^2 / ( 2 * (D - 2) * MTTR10) <> MTBF^3 / (D * (D - 1) * (D - 2) * K^2 * MTTR10^2)

1 / 2 <> MTBF / (D * (D - 1) * K^2 * MTTR10)

MTBF = 10^6 hours

MTTR = 1 hour

K = 24 // RAID10 rebuild time 1 hour RAID6 rebuild time 24 hour

A 60 drive RAID6 array would be just less reliable than a 116 drive RAID10 array.

There are enough 2nd order effects that make this calculation overly optimistic.

For a modest size array it is going to be hard to argue against RAID6 on a reliability standpoint.

Share this post


Link to post
Share on other sites
You have a few errors in your post. First, a mathematical error, and second a logical error.

Starting from the point of 1 drive failure, the odds of a second concurrent failure taking down the array are 1 in 5 (20%), since there are 5 operating drives in the array after the first failure, and only 1 of those (the mirror of the failed drive) would actually take down the array if it failed. For the 3rd concurrent failure, the odds of this increase to 1 in 4 (25%) since there are still 4 operating drives in the array.

That is not correct. A 6-drive RAID 10 has 3 pairs for a total of 6 drives. After one drive failure you have 2 pairs and a singleton. A second failure has, as you said, a 20% chance of hitting that singleton. Assuming it does not, a third drive failure has a 50% chance of killing the array (two singletons and one pair of drives remain; either singleton's failure will take out the array). So the probability of arbitrary drive failures resulting in array failure for a 6-drive RAID 10 is as follows:

0% 1 failure

20% 2 failures

60% 3 failures

100% 4+ failures

The same probabilities for a 6-drive RAID 6 array are:

0% 1 failure

0% 2 failures

100% 3+ failures

Another thing to consider is the "window of vulnerability" (time to rebuild a failed drive onto an available hot spare before another drive fails). RAID 6 rebuild times are significantly longer than RAID 1+0, because of all the XOR calculations and multiple disk reads that must occur. RAID 1+0 just has to copy data directly from one drive to another with practically no overhead, so the "window of vulnerability" is considerably shorter. This further reduces the odds of "concurrent" failure in RAID 1+0.

The rebuild time for a RAID 10 array is not more than the rebuild time for a RAID 5, though in the best case both have the same rebuild time, dependent on the spare disk's sustained write speed. You compare the rebuild times for both RAID types under the condition that an additional drive failure may cause array failure, yet this is not a fair comparison. After a single drive failure, a RAID 10 array is vulnerable to loss if a second disk dies before the degraded mirror is rebuilt. In contrast, a RAID 6 array which has lost a drive is not vulnerable and retains redundancy while the rebuild is in progress. Comparing the rebuild times of both RAID levels without acknowledging or taking into account their different failure modes is at best in error and at worst dishonest.

You are absolutely correct on the point regarding the percentage chance of successive failures taking down the RAID 1+0 array. I neglected to take into account the fact that if there were 2 failures and the array was still operational that, obviously, a failure affecting either of those two drives would cause the array to fail completely. For the 3rd failure, there is actually a 50% chance of the array failing, not 25% as I previously posted. However, it was simply a foolish mistake on my part, which I readily admit, rather than any attempt at deception. My sole purpose in posting here is to attempt to disseminate quality information to help people solve problems.

However, I would question the rebuild time for a RAID 5 or RAID 6 array being equal to a RAID 1+0 array's rebuild time (even in the best case). Since the controller must read data from all other drives to perform the XOR calculation to generate the parity to be stored, it would seem that there is going to be more calculation and I/O activity in reconstructing a RAID 5 or RAID 6 array, whereas to "rebuild" a failed mirror simply involves copying data from one disk directly to another without any parity calculations or other disk I/O involved. Can you elaborate on your statement or provide other supporting data?

Also, when you state that a RAID 1+0 array is vulnerable to loss if a second disk dies before the degraded mirror is rebuilt, you neglect to mention explicitly that this is only true if the second disk that happens to die is the other drive in the mirror pair with the drive that failed. As long as either drive in a mirror pair is okay, then a RAID 1+0 array is perfectly fine. I believe that was actually what you meant, and I'm not trying to nitpick here, but this seemed to worth clarifying for the sake of those reading this that may not have been aware of that.

Share this post


Link to post
Share on other sites

For just moving files around I guess you could get great performance with both types of setup, there are some things to consider though...

To get really great raid 5/6 performance in my experience you need a real hardware raid controller. You also need to spend some time to figure out the setup of that controller and the drives and sometimes even the file system.

If you want you file server to do anything else (like performing operations on the files, even trivial things like calculating checksums) you will usually want both great io performance AND plenty of cpu resources to spare. The faster the io system is, the more cpu resources you will need to keep up.

Since there has been a lot of raid 1+0 versus raid 5/6 discussion I'll give my comments on them too...

First of all you must know the workload, if it is not suitable for raid 5/6 forget it, but if it is the equation turns out pretty well for long term investments...

For a one off system with capacity X you will usually get better performance AND lower price with raid 1+0 unless X is really big (like 10+ drives of effective storage). However it is not unreasonable to expect that capacity requirements will change (and you know what that means...), and in that case you can save a large amount of money by buying fewer drives (when you do the upgrade you already have the controller). Suppose that you need 2TB now, and expect to need 4TB in two years and 8TB in four years. If your workload is suitable for raid 5 I'd spend the money on that controller and get that money back by saving on the drives needed in two years.

You also need to figure out the raid type effect on the other components like case, power supply and even for energy consumption. For large arrays fewer drives can really simplify things.

You also have to remember that a fast computer can do anything, a raid controller is just a raid controller...

Personally I really hit the barrier performance and capacity a year ago, I bought a fast raid controller and it has already saved me a lot of headache and money.

Share this post


Link to post
Share on other sites
However, I would question the rebuild time for a RAID 5 or RAID 6 array being equal to a RAID 1+0 array's rebuild time (even in the best case). Since the controller must read data from all other drives to perform the XOR calculation to generate the parity to be stored, it would seem that there is going to be more calculation and I/O activity in reconstructing a RAID 5 or RAID 6 array, whereas to "rebuild" a failed mirror simply involves copying data from one disk directly to another without any parity calculations or other disk I/O involved. Can you elaborate on your statement or provide other supporting data?

RAID 5/6 could be just as fast if the array is idle, since all disks can read at full STR and the XOR operations themselves aren't expensive/hard. However, this does require a controller with enough IO bandwidth and a good 'firmware' implementation. But that applies to more parts of RAID 5/6.

However, if the array isn't idle, read requests involve all disks instead of one and that's the real cause of the long rebuild times I think.

BTW, I think your claim that a RAID 10 array gets more reliable if you add more disks is flawed, since it doesn't take into account that more disks are likely to fail.

Edited by Olaf van der Spek

Share this post


Link to post
Share on other sites
Wouldnt running the drives normally (non raided) be easier if combined with a good smart monitoring software?

Yes, it'd be easier. But far less reliable.

And while discussing reliability, you shouldn't forget that RAID is about uptime and backups are to save your data.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this