Jump to content


Photo

Is RAID 5/6 dead due to large drive capacities?


  • You cannot start a new topic
  • Please log in to reply
15 replies to this topic

#1 Scrotos

Scrotos

    Member

  • Member
  • 5 posts

Posted 30 May 2013 - 05:32 PM

I apologize if this has been addressed ad naseum but I couldn't find anything here using the forum search.

I'm trying to design a storage system for my business and in researching I ran across several articles claiming that RAID 5/6 were effectively dead due to large drive capacities. The theory is that if you have several 2 TB drives and a drive fails, during the rebuild there is a 100% chance that you will get an error on rebuild and the entire array will die.

I start reading all these articles from 2007 onward about how rebuilding/resilvering a RAID 5/6 array will assuredly fail once you have 12 TB or more of data. The reasoning is that a drive has an "unrecoverable read error" (URE) rate of 1 bad read every 10^14 reads. When you do the math with 512 byte sectors, that's around 12 TB worth of data. Not a big deal if you're reading 12 TB of data from one drive as the built-in hard drive controller detects it and maps it as a bad sector and moves the data elsewhere, but if you're rebuilding an array that causes the entire array to be unable to be rebuilt and you lose all data in the array.

Here are the articles/discussions I'm referencing:

http://queue.acm.org....cfm?id=1670144
http://www.zdnet.com...ing-in-2009/162
http://www.zdnet.com...ing-in-2019/805
http://www.smbitjour...-more-reliable/
http://www.raid-fail...d5-failure.aspx

I primarily use HP products which means HP SmartArray RAID controllers. The drives have traditionally been 15K 146 GB or 300 GB 2.5" SAS drives in arrays no larger than 8 drives. I've used RAID 5, 6, and 1+0 and also have some hot spares ready to go in some of the arrays. So the largest array I've made has been less than 2 TB (1.2 or 1.6 maybe?) in total. I run Win2K8 R2 on our servers.

I'm now looking to create something like a 24 TB or larger array. In doing that I wonder what my options are as far as array portability (in case of server hardware failure) and RAID structure. If I have 12 x 2 TB drives or 8 x 3 TB drives, I'm now starting to get nervous that during a rebuild I'll lose everything. The SmartArray controllers are pretty good about letting you move arrays to and from similar controllers using the same drives. Maybe Windows software RAID (dynamic disks or whatever) would make sense for portability to a different server.

I know that RAID is not a backup. That point comes up from time to time so I just wanted to address this now that I'm not planning on using this as a backup. Not even discussing anything related to backups with this post.

The arguments that I linked to say that because of the URE on drives, you're certain to get a failure during a rebuild and this is due primarily to each drive having larger capacity. By that token, it also seems that reading a 2 TB drive 6 times would also give you an URE, but it wouldn't matter in that instance because you're not trying to rebuild an array from data and parity, you're just losing a sector. In a RAID rebuild, that would cause the entire thing to die.

I get the feeling that the math doesn't match up to reality, but I don't have access to something like that Google hard drive survey to give real-world experience. So all I got is the math to feed my fear.


I figure if anyone has real-world experience with large storage arrays and DIY arrays, it's y'all. I'm not looking into some prepackaged EMC or something where the storage is a giant mystery box that "just works" and I'm not going to be trying to figure out a *nix. I don't have a need for massive performance as I'll be constrained by gigabit Ethernet or some external SAS links (assuming a drive enclosure). My main concerns are:

1) Have people run into these rebuild issues with RAID 5/6 using 1+ TB drives? If so, would some type of RAID 6+0 or sets of mirrors or something mitigate these issues? Do I just need to choose a RAID type that avoids parity as that's where the disaster can happen during rebuild?

2) Rebuilding an array has the same issues that growing an array would have, correct? So if I get 6 TB now and add a few drives at a time, am I courting disaster? I'm not planning on doing this but from an academic standpoint it seems the conventional wisdom is that growing arrays is bad. I've done it before with nary a problem with my dinky little 146 GB drives so I didn't know if it's a size thing, perhaps?

3) Any suggestions for a 24 TB setup? Or examples of what y'all run? Any reason to go with 2.5" versus 3.5" drives if I decided to do commodity Seagate or WD or Hitachi enterprise drives rather than HP-branded drives? It's just a drive density consideration if I want to save on space, right? Any suggestions on software like FlexRAID? I know it's probably not a first choice for an enterprise-type of software RAID solution but I don't know any other brands/companies off the top of my head for that kind of product except "install Linux and use [random lowercase utility using as few letters as possible]".

I thank you for your time reading this and for any information or suggestions you may have!

Edited by Scrotos, 05 July 2013 - 12:49 PM.

#2 FastMHz

FastMHz

    Member

  • Member
  • 405 posts

Posted 31 May 2013 - 09:04 AM

With how inexpensive huge drives are these days, I just have a mirror drive for every working drive (as in my 16TB NAS). The parity calc overhead and potential issues that you bring up just aren't worth it to me to go RAID5/6. On top of that, I have a second portable MiniNAS that acts as a 3rd mirror of the entire volume AND that can be easily taken off-site.

EDIT: If you're interested in exactly how I have my storage set up, let me know. My "home-brew" setup has been in play for my small biz for a good 12 years without a single lost file.

Edited by FastMHz, 31 May 2013 - 09:08 AM.

Production: Vishera 8350/32gb RAM/Dual SSD/VelociRaptor/Radeon 7750
Gaming: Vishera 6350/16gb RAM/SSD/VelociRaptor/2x Radeon 7950 Crossfired
Retro: K6-2 550/256mb RAM/160gb HDD/CompactFlash/3DFX/ATI AIW Pro/SB16/DB50XG
http://www.fastmhz.com

#3 Scrotos

Scrotos

    Member

  • Member
  • 5 posts

Posted 31 May 2013 - 11:54 AM

With how inexpensive huge drives are these days, I just have a mirror drive for every working drive (as in my 16TB NAS). The parity calc overhead and potential issues that you bring up just aren't worth it to me to go RAID5/6. On top of that, I have a second portable MiniNAS that acts as a 3rd mirror of the entire volume AND that can be easily taken off-site.

EDIT: If you're interested in exactly how I have my storage set up, let me know. My "home-brew" setup has been in play for my small biz for a good 12 years without a single lost file.


Aye, I'm curious. Cost may be a factor doing a bunch of mirroring though as I'm hoping to keep with SAS and that stuff can get expensive.

#4 cbrworm

cbrworm

    Member

  • Member
  • 131 posts

Posted 31 May 2013 - 04:48 PM

I have read the reports as well. I would say they are not dead yet. I have been migrating people to RAID 6 where in the past I would have used RAID 5. I may just be lucky, but I have not had a second drive fail during a rebuild - much less a third. Most of the arrays I have deployed are between 6 and 8 drives - maybe that is why my luck has been good.

I have always stuck with Seagate or Hitachi enterprise class drives (at least since SAS has been around), and adaptec controllers.

I am curious about these new lower cost, lower MTBF drives - these may put us in that situation.


The real key is backups.

I am getting ready to do a test on an in house server that I should not admit to, I am going to use mismatched good quality SATA drives capable of their respective time limited error recovery (but not enterprise class drives) and see what happens over time. It may be a very short experiment.

#5 FastMHz

FastMHz

    Member

  • Member
  • 405 posts

Posted 01 June 2013 - 12:25 PM

@Scrotos: Here's a brief overview of how my data storage is configured:

Primary NAS: Win7x64 Lite on SSD, 8GB RAM, Athlon II x2 255 CPU @ 3.1GHz, Samsung HD103SI and HD204UI Hard drives

5x Working drives in main tower on mobo SATA3 ports
5x Mirror drives in eSATA towers
Room for 3 more pairs of drives

Primary drives are pooled using Drive Bender into one 8TB volume.

Every individual disk is mirrored via software (Allway Sync) occurs every 24 hours or manually triggered when I change a bunch of data. Also used to sync the primary NAS to the portable NAS.

No spin down, stays on 24x7. Maximum capacity would be 64TB with 4TB drives.


Portable NAS: Win7x64 Lite on SSD, 8GB RAM, AMD A4-5300 CPU @ 3.4GHz,
4x Seagate ST2000 HDDs

Single Windows striped volume, no redundancy, no spin down.

This unit functions as a way to transport all of my data easily, and also as a 2nd (offsite) mirror.


This setup works very well for me. No RAID headaches, can stream a constant 100MB/s thru gigabit to my home LAN. I've never had data loss. If I accidentally delete a file, I can pull it from a mirror, and this has saved me more than once! Also, I don't trust any HDD over 2TB *yet*.

Edited by FastMHz, 01 June 2013 - 12:32 PM.

Production: Vishera 8350/32gb RAM/Dual SSD/VelociRaptor/Radeon 7750
Gaming: Vishera 6350/16gb RAM/SSD/VelociRaptor/2x Radeon 7950 Crossfired
Retro: K6-2 550/256mb RAM/160gb HDD/CompactFlash/3DFX/ATI AIW Pro/SB16/DB50XG
http://www.fastmhz.com

#6 Elena

Elena

    Member

  • Member
  • 6 posts

Posted 02 June 2013 - 01:05 AM

For Google's data, search for

Failure Trends in a Large Disk Drive Population

e.g. here - http://research.goog...s/pub32774.html

That was data which went into our RAID failure calculator, I recall.
Elena of www.ReclaiMe.com

#7 continuum

continuum

    Mod

  • Mod
  • 3,581 posts

Posted 03 June 2013 - 08:33 PM

Ugh, this topic comes up on quite a few major forums every few days/weeks/months/years.

Long story short, no way, RAID is very much alive. Our storage needs are constantly growing. Customers 10 years ago where we ran 16x200GB drives today are now running 16x4TB or more in the exact same space...


For home users, if your data isn't big or isn't growing nearly as fast (or both), then, well, RAID rarely made sense 10 years ago, it still doesn't make any sense today. But for many of us, we need all the space we can get.

#8 jtsn

jtsn

    Member

  • Member
  • 94 posts

Posted 04 June 2013 - 10:07 AM

The bigger the capacity gets, the longer takes the rebuild. So the risk of multiple drives failing, before the rebuild is complete, increases.

#9 Scrotos

Scrotos

    Member

  • Member
  • 5 posts

Posted 05 July 2013 - 12:48 PM

Ugh, this topic comes up on quite a few major forums every few days/weeks/months/years.

Long story short, no way, RAID is very much alive. Our storage needs are constantly growing. Customers 10 years ago where we ran 16x200GB drives today are now running 16x4TB or more in the exact same space...


Yeah, but I've not really seen anyone who's run into problems or successes in the real world. Your last sentence relates to the following reply...

The bigger the capacity gets, the longer takes the rebuild. So the risk of multiple drives failing, before the rebuild is complete, increases.


So seriously, I know people still use RAID. But are they doing it out of "best practices" from 1995? I did do a search here first but didn't find anything though the search function didn't like terms as small as "RAID" and "5" and "DEAD" so I may have missed the topic that comes up every few days/weeks/months/years here.

FastMHz, thanks for the info. I see DIY people building 20+ TB storage systems so I don't discount anyone's experience! I'm still waiting on management to approve of buying a storage server. I'm thinking something with 3.5" 2 TB SAS drives in a RAID 1+0 configuration because while the cost is higher, in my mind there's lower risk during a rebuild. We have backups but in case of a drive failure I'd rather have the storage remain online.

continuum, is there another forum you can suggest where I can read some of the debate on this subject? I can find far more naysayers (like BAARF: http://www.miracleas...ARF/BAARF2.html ) on parity-based RAID than I can find proponents. The only proponents are kind of dismissive and don't really back up anything; RAID is not a backup, RAID is not dead, there's nothing wrong with RAID 6 using 4 TB drives, etc. Meanwhile the naysayers use math and anecdotal accounts of disaster. I'd at least like to get some anecdotes from people saying parity is great and high-capacity rebuilds work perfectly for them all the time, know what I mean? :D

#10 continuum

continuum

    Mod

  • Mod
  • 3,581 posts

Posted 11 July 2013 - 01:25 AM

Those of us using large RAIDs are probably mostly doing so in enterprise settings. Go talk to a real enterprise vendor and not ancedotes from the internet or from the local mom-n-pop computer shop down the street. Sorry, most of the vendors with real numbers put this under NDA... closest you can probably get for this info posted to the public is the storage provider Backblaze.


And RAID5 has always been a bad idea for those who need serious high availability and data integrity, since when one drive fails, your parity protection is gone, and with modern drives being so large, your chances of an uncorrectable read error during the rebuild (which you have no protection) in a multi-TB array are virtually 100% (at least by rated specs and the math, which I am too lazy to do again at the moment).

If you must run a parity RAID, run a RAID6 (or in Linux software raid, equivalent would be RAID-Z2).

I don't think anyone sane reasonably recommends RAID without specific use cases in mind. :D

And a big thing end-users/typical consumers tend to forget: every time you add a component to a system, you are increasing the system's risk of failure. One harddisk connected to your motherboard? That's two things that might go bad (1x HD, 1x MB). Compared to say six harddisks connected to your motherboard? That's now seven things that can fail. Storagereview's reference guides have a nice page on how that affects failure rates and MTBF calculations.



And I've never heard of BAARF, got any more reputable links?

#11 FastMHz

FastMHz

    Member

  • Member
  • 405 posts

Posted 11 July 2013 - 03:15 PM

I personally avoid RAID and mirror everything....multiple times. Expensive, yes, but downtime and potential for loss are exceedingly low.

And this:

Attached File  The-RAID-Catastrophe.pdf   1.43MB   21 downloads

Production: Vishera 8350/32gb RAM/Dual SSD/VelociRaptor/Radeon 7750
Gaming: Vishera 6350/16gb RAM/SSD/VelociRaptor/2x Radeon 7950 Crossfired
Retro: K6-2 550/256mb RAM/160gb HDD/CompactFlash/3DFX/ATI AIW Pro/SB16/DB50XG
http://www.fastmhz.com

#12 Scrotos

Scrotos

    Member

  • Member
  • 5 posts

Posted 17 July 2013 - 01:48 PM

And RAID5 has always been a bad idea for those who need serious high availability and data integrity, since when one drive fails, your parity protection is gone, and with modern drives being so large, your chances of an uncorrectable read error during the rebuild (which you have no protection) in a multi-TB array are virtually 100% (at least by rated specs and the math, which I am too lazy to do again at the moment).

...

And I've never heard of BAARF, got any more reputable links?


Besides the 5 links in the first post? Which also had a RAID 5 rebuild error calculator based on drive UREs so you wouldn't have to do math yourself? That the blokes behind the RAID 5 calculator (Elena, right?) even posted about in this thread?

Honestly, did you just skim the thread title and fire an off-the-cuff response without bothering to read it? It kinda seems like that.


Those of us using large RAIDs are probably mostly doing so in enterprise settings. Go talk to a real enterprise vendor and not ancedotes from the internet or from the local mom-n-pop computer shop down the street. Sorry, most of the vendors with real numbers put this under NDA... closest you can probably get for this info posted to the public is the storage provider Backblaze.


Aye, but like you said, any real stats are behind NDAs. So... what are my choices? Ask for anecdotes or just blindly throw money at some solution and hope for the best? In what world is it a good idea to ask the person selling you stuff if you are spending more money than you need to or should, especially if they are on commission?

I'm reading FastMHz's info now, it's pretty informative. I'm also reading the link you gave but it'll take some time to get relevant info out of it.

I'm probably going to end up with something like an HP MSA60 in a 12 x 3 TB HP SATA drive array doing RAID 1+0 hooked up to an HP DL360 G6 or DL380 G6. We also have a DL320s as well though I am unsure if that supports drives above 2 TB. We run a variety of Proliant 3xx G5 to G7 servers but typically use RAID 5/6 with small drives, like 146 GB 15K or 10K SAS and not a ton of spindles; 8 max so far in one array. So no experience with large drives in a parity setup. I really am not worried about them 72/146/300 GB drives in RAID 5/6. I had looked into something with dual-domain and dual-path capability like the D2600 or even outfitting the MSA60 with SAS instead of SATA and getting appropriate controllers and options on the MSA60, but for this storage pool we have plenty of backups and we don't need high availability with the associated cost. We typically go for HP branded equipment on our servers with the exception of an 8-drive RAID 1+0 for a db using 256 GB Samsung 840 Pros; not had any issues with them so far on an HP DL380 G5 using a SmartArray P400i though yes I realize we are leaving a ton of performance on the table using them in such an old machine.

I was hoping more "of us using large RAIDs are probably mostly doing so in enterprise settings" would have chimed in with some real-world experience of failures in arrays using 8 or more drives drives that are 2 TB or larger per drive. I mean, if they ain't hangin' out on Storage Review forums, where would they be? I don't even care about enterprise-specific versus enthusiast-specific as long as there's someone who's running 2+ TB drives in a RAID 5 or 6 and had to rebuild at some point. :)

#13 dyeow

dyeow

    Member

  • Member
  • 2 posts

Posted 17 July 2013 - 07:31 PM

There's something I don't get about these raid 5 failure calculations. The math on some of these links is a little off (raid-failure.com appears to get it right), but that doesn't change the conclusion that much (ie there's a significant chance of data loss). What does however is the assumption that a single read error means the entire rebuild is toast - why?

Obviously the sector with the URE is history, but parity calculations are bitwise - that is each (bit, but you're reading a sector at a time) sector is independent (parity sector N = drive 1 sector N xor drive 2 sector N xor drive 3 sector N etc). Drive 2 sector N having a URE means you have no idea what that sector was and can't recover it, but why does that have any impact on any other sector (or even sector N on other drives?)? Why can't you rebuild the rest of the array?

If that's correct (I'm no storage expert, am I missing something there?) then applying the same math to raid 1/10 after a drive failure is pretty scary as well (less chance of data loss than raid 5 sure, but still very high). The basic problem here is that 3TB is a *lot* at an error rate of 1e-14/bit. The chance of being able to read everything from a single disk without error is about 79%, which means you've got a decent chance at some data loss after a single drive failure unless you're triple mirroring or using raid 6.

#14 continuum

continuum

    Mod

  • Mod
  • 3,581 posts

Posted 18 July 2013 - 09:55 PM

The only real link of interest you posted was the ACM one, the ZDnet ones don't tell us anything we don't know, and the RAID failure calculator (thanks for that, but they don't do a RAID6 calculator, which would be useful-- although the ZDnet links do a decent job on that). If I wanted anecdotes written by media professionals... ;)

Most of those links all smell purely theoretical anyway... which is about as detailed as many of us are allowed to get. ;)

real-world experience of failures in arrays using 8 or more drives drives that are 2 TB or larger per drive.

We aren't thick on that sort of experience, at least not where I am. We strongly discourage our customers from implementing arrays with that many disks as the rebuild failure probability gets really scary. RAID6 means you are almost certainly going to get an acceptable percentage of rebuilds that succeed (many of our customers here demand extremely high reliability, although I wouldn;'t guarantee anywhere close to five 9's...).

Most nearline drives have rated 1*10^15 UBER instead of 1*10^15, which helps, and some have said the real-world UBER of the typical harddisk is an order of magnitude better than rated, although I have no source for this and cannot confirm it. Some enterprise disks have 1*10^16 UBER, but those usually are 10K and 15K SAS disks which are pricey per GB and usually not available in huge capacities (which is why you're using RAID in the first place!).

if you're a large enough customer, go talk to the engineers at whatever place you're buying from, see if they will bump you beyond the level 1/2 support into level 3, which then if they have any sales engineers with any brains (and if you have the $$$$ account size), can probably get you to talk to actual engineers. I haven't met with anyone from WD or Seagate in a while unfortunately, been working on some other customer needs/product development... :-/

#15 Scrotos

Scrotos

    Member

  • Member
  • 5 posts

Posted 01 August 2013 - 12:16 PM

if you're a large enough customer, go talk to the engineers at whatever place you're buying from, see if they will bump you beyond the level 1/2 support into level 3, which then if they have any sales engineers with any brains (and if you have the $$$$ account size), can probably get you to talk to actual engineers. I haven't met with anyone from WD or Seagate in a while unfortunately, been working on some other customer needs/product development... :-/


Oh man, wouldn't I love to be able to throw money at a problem! Yeah, not a large enough customer for anyone to give us the time of day, I'm afraid.

dyeow, my understanding of all the blah blah blah is that if you get a bad sector the rebuild process can't use that information to rebuild the array because then it's missing two pieces of information instead of just one which is what RAID 5 would protect against. And I guess the size of each drive increasing means a greater chance of that scenario in RAID 6 (two drives get UREs during the rebuild process) to hose things up.

A mirror would just have one file or piece of a file affected and continue the rebuild but a parity-based array would hard-stop in the middle of a rebuild and you'd lose everything. At least, that's the worst-case fearmongering interpretation of things.

I don't know enough about what a rebuild would do if it encountered that type of scenario to know if the fears are founded or if it's just a ploy for page hits on these kinda articles. It's probably controller and even firmware specific.

#16 dyeow

dyeow

    Member

  • Member
  • 2 posts

Posted 01 August 2013 - 10:26 PM

Oh man, wouldn't I love to be able to throw money at a problem! Yeah, not a large enough customer for anyone to give us the time of day, I'm afraid.

dyeow, my understanding of all the blah blah blah is that if you get a bad sector the rebuild process can't use that information to rebuild the array because then it's missing two pieces of information instead of just one which is what RAID 5 would protect against. And I guess the size of each drive increasing means a greater chance of that scenario in RAID 6 (two drives get UREs during the rebuild process) to hose things up.


Yes, you'll probably lose some data if said drives really have a 1e-14 URE rate in practice - at least with a single drive loss in raid 5 (or 1/10 for that matter). In that situation you have zero redundancy and as I said the probability of reading a complete 3TB disk with a 1e-14/bit error rate is only about 79%. 6TB is a 62% success rate and so on. Obviously you'd expect something to go wrong here.

Raid 6 however is a completely different story, at least with loss of a single drive. The chance of two drives encountering UREs for any particular bit are so remote that you can't actually perform this calculation with double precision fp math. Lose a second drive and all bets are off again of course.

This is all predicated off URE rates actually being a real error rate and not just a number on a datasheet of course and that's certainly not established.

A mirror would just have one file or piece of a file affected and continue the rebuild but a parity-based array would hard-stop in the middle of a rebuild and you'd lose everything. At least, that's the worst-case fearmongering interpretation of things.

I don't know enough about what a rebuild would do if it encountered that type of scenario to know if the fears are founded or if it's just a ploy for page hits on these kinda articles. It's probably controller and even firmware specific.


Agreed this is probably controller specific - there's no technical reason why you couldn't do exactly the same thing the mirrored array did - corrupt a bit (sector) and move on. There's simply no reason to just stop and lose all data that I can see (other than a dodgy controller implementation). There's *other* reasons not to prefer raid 5 of course, but I've never been able to make sense of this one.



0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users