cdh

RAID5, 16-Drive Issue

Recommended Posts

I hope so much that someone here can help me (very worried). I was having trouble with my RAID5 recently. Drives were coming up as failed or missing. I was able to get them all back except for two...one said failed, and the other said missing. However, I tried many times to remove and re-insert the missing drive. It would not come back up. So, after reading online, it seemed an option was to delete the RAID set and then re-create it. I did so, and my D and E drives were both visible as they should have been. The D drive seemed to work also. However, the E drive did not work, which is the vast majority of the content (the D drive is about 750 GB, and then the E drive is the rest of the 15TB). I remembered that I didn't select the "Greater than 2TB option when I re-created that RAID set (I thought the option was only if your operating system did not support more than 2TB...mine does...Windows XP x64). Since the other one worked, though, and it was not greater than 2TB, I figured that's what it must be. So, I went into the volume set options and changed that setting. It didn't say it was going to erase the disk contents, unless I missed it (I have 3% of my vision...so, it is possible, although I was looking very closely since this is extremely important to me). Yet, it began initializing, which to me means bad news. It was at about 15% by the time I got into the web admin to see what was happening now. I shut down the computer, turned off the drive boxes, and then rebooted so that I could send this message. Firstly, is it indeed erasing all my data? If so, can it be aborted/reverted at this point (long shot, I guess, but I am hopeful!)? I very much hope someone here will have good news for me. Thanks much in advance.

Share this post


Link to post
Share on other sites

There are software tools that exist to recover damaged RAIDsets, but a 16-disk RAID5 that;'s already been 15% initialized?? Eeeeeek. You have probably just screwed yourself...

(and as you experienced, rebuild times on arrays this large get really scary, you have a HUGE window of vulnerability to a 2nd disk failure, which hoses you pretty good. I would, in the future, run it as a pair of 8-disk RAID6's at least..)

Share this post


Link to post
Share on other sites

RAID 5 on large SATA drives is NOT reliable. Full point.

RAID 6 on more than 10 large SATA drives is NOT reliable as well.

Use RAID 10 and Enterprise class SATA drives (with a 1 per 10E15 BER) to avoid those kind of problems !

Share this post


Link to post
Share on other sites

Just to understand how BAD is a 1TB SATA with a 1 per 10E14 BER in a raid array.

A BER/UBE of 1 per 10E14 means a 8.8% probability of ONE unreadable sector per 1TB read !

==> You have 16 of those 1TB unreliable HDD !

Enterprise class SATA drives (and large SAS HDD) are one order of magnitude more reliable : 1 per 10E15

Most small SAS HDD are 1 per 10E16

Share this post


Link to post
Share on other sites
Just to understand how BAD is a 1TB SATA with a 1 per 10E14 BER in a raid array.

A BER/UBE of 1 per 10E14 means a 8.8% probability of ONE unreadable sector per 1TB read !

==> You have 16 of those 1TB unreliable HDD !

Enterprise class SATA drives (and large SAS HDD) are one order of magnitude more reliable : 1 per 10E15

Most small SAS HDD are 1 per 10E16

Many non-enterprise class SATA drives have a rating of 1 sector in 10^15.

Also, rated values != reality.

Rated values == marketing.

1.5TB x5 = 7.5TB

Zero unreadable sectors in over 100TB read.

Explain that.

Share this post


Link to post
Share on other sites
Many non-enterprise class SATA drives have a rating of 1 sector in 10^15.

You are welcome to keep buying Desktop class drives.

Zero unreadable sectors in over 100TB read.

To me, the UBE/BER point is more sensitive at REBUILD time where the HBA reads the ALL drive at once.

Share this post


Link to post
Share on other sites
zpool scrub is a read-verify of the entire drive at once. I do it daily.

You've still offered no explanation.

I can confirm observations on qasdfdsaq with opensolaris / zpool scrub. No problems at all.

Share this post


Link to post
Share on other sites
I can confirm observations on qasdfdsaq with opensolaris / zpool scrub. No problems at all.

Well, can you tell us how many sectors did the scrubber fixed ?

Scrubber fixed zero sectors. There were no read errors or checksum errors.

Zero reallocated or unreadable sectors in SMART logs. Want me to paste all five? Not that this is relevant, because you claim "unreadable" sector. An unreadable sector would show up as a read error.

Stop trying to scare people, there is nothing of merit to your claims.

Edited by qasdfdsaq

Share this post


Link to post
Share on other sites
There are software tools that exist to recover damaged RAIDsets, but a 16-disk RAID5 that;'s already been 15% initialized?? Eeeeeek. You have probably just screwed yourself...

Depends how much and what type of data is on it, and also on your partitioning format & structure. I'm suspecting that imaging the array to a second array is out of the question, so if you have the space you could use a program that can reconstruct RAID arrays on-the-fly to recover your data.

R-Studio for example can reconstruct RAID's virtually and non-destructively in memory, but you still need something to recover to. Of course the 15% is likely to be completely lost unless you have a RAID card that initializes in a very specific way and fancy doing some programming...

Share this post


Link to post
Share on other sites
Stop trying to scare people, there is nothing of merit to your claims.

I just don't want readers here to use unreliable solutions like the 16x large Desktop class SATA drives seen in this topic.

Share this post


Link to post
Share on other sites

My main point was that you were saying "desktop class" SATA drives are unreliable, period, and that one must use enterprise drives. You backed this up purely by quoting one specification, the bit error rate.

I showed that this rating is not exclusive to enterprise drives, and that it bears no resemblance to reality. I proved this using my own drives which are in no way special or "enterprise". It's a marketing gimmick to make people think enterprise drives are more reliable. Whether they are or not, it has very little to do with the BER.

16x large desktop class SATA drives are not unreliable solutions. In this situation, this guy may have been better off with RAID-6 rather than RAID-5 but that still doesn't give you any reason to say desktop class drives are unconditionally rubbish.

Many of the world's biggest datacenters and storage companies use large desktop class SATA drives. For example this, and this. Hell, Google runs on desktop-class SATA and IDE drives. Hell I'm willing to bet Amazon S3 uses them too.

If they're perfectly suitable for large enterprises in RAID-5 and RAID-6 configurations, then they're perfectly suitable for home use. They are not unreliable solutions. Period.

Share this post


Link to post
Share on other sites
the bit error rate.

I showed that this rating is not exclusive to enterprise drives, and that it bears no resemblance to reality. I proved this using my own drives which are in no way special or "enterprise". It's a marketing gimmick to make people think enterprise drives are more reliable. Whether they are or not, it has very little to do with the BER.

You have your own experience about this. Assuming it's the only truth means you can't listen to others experiences here.

We do have people thinking differently here from their experience: silent error corruption may be one bad sign you may notice on your loved SATA Desktop class drives large arrays.

16x large desktop class SATA drives are not unreliable solutions.
They are just so reliable that this topic does not exist :blink:

Of course, it depends on the drive model and SR was allowing to "trace" those drive's reliability...but we still have a "SR falling down" active topic.

Google runs on desktop-class SATA and IDE drives. Hell I'm willing to bet Amazon S3 uses them too.

They extensively TEST the drive models they use before going live.

So they are not depending on any "marketing gimmick" and are using reliable hw.

Share this post


Link to post
Share on other sites

Hello again, about silent corruption i had recently problem with my onboard ICH10R (link here:link), I tried a virtual machines with different OS and also opensolaris with zfs on my PC and only zfs found problem with silent corruption on the file system where were vmware files (.vmdk). So, the point is: you can use absolutely without problem cheap disks with some software solution, that has silent corruption detection and prevention technology (e.g. ZFS which is awesome). Also i make every week scrubbing of data from zfs pools and until the moment i don't have any checksum errors found.

Share this post


Link to post
Share on other sites
you can use absolutely without problem cheap disks with some software solution, that has silent corruption detection and prevention technology (e.g. ZFS which is awesome). Also i make every week scrubbing of data from zfs pools and until the moment i don't have any checksum errors found.

I fully agree...and I am more and more considering "background scrubbing"+"checksum" solutions as mandatory points...allowing for cheap drives to be used.

Share this post


Link to post
Share on other sites

Silent data corruption can happen on any class of drives, yet I've been using ZFS for a while and seen no silent data corruption on many large consumer drives.

Yes it may be my experience, but my experience comes from running support in an IT department that runs thousands of computers. I still figure that there's nothing wrong with desktop drives.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now