Sign in to follow this  
shuckey

RAID Failure - Best Practice

Recommended Posts

Hi all,

I'm new here and not completely sure if I put this in the correct location or not. I have a question about RAID failures and best practices after a HD in RAID fails.

My question arose with a four disc RAID 5 set up in a Dell Server. A drive failed and a new drive replaced it. We had data corruption for various reasons. I have then been in discussions with a few data recovery companies. The folks at the data recovery companies told me that the worst thing to do when a drive fails in a RAID set up is to replace the drive. They say to shut down the hardware and send everything to them right away.

The manufactures and distributors advertise that you should not worry when a drive fails (like in a RAID 5, RAID 6, or RAID 10). All you have to do is pop in a new drive, let it rebuild, and you are ready to go.

The data recovery companies have a financial interest in what they say. The manufactures and distributors also have a financial interest in what they say.

Who should I believe? Should I just replace a failed HD as the manufactures and distributors advertise? Should I immediately shut down the hardware and send everything to the data recovery companies? Are there different RAID set ups that dictate which path to take?

Never underestimate the importance of a good back up procedure which should include a checksum process of somesort. This is what helps us out and another friend of mine.

Thanks so much!

Shawn

Share this post


Link to post
Share on other sites

My question arose with a four disc RAID 5 set up in a Dell Server. A drive failed and a new drive replaced it. We had data corruption for various reasons. I have then been in discussions with a few data recovery companies. The folks at the data recovery companies told me that the worst thing to do when a drive fails in a RAID set up is to replace the drive. They say to shut down the hardware and send everything to them right away.

I think I threw up in my mouth reading that. Rebuild it, that's why you have RAID. Who could part with their data for weeks to have it looked at anyway?!?

Share this post


Link to post
Share on other sites
We had data corruption for various reasons.

Stop right there-- that means there's more at play than just a failed drive, or something prior to the failed drive actually failing was causing issues.

And yes, a strong consideration for RAID5-- a RAID5 with a failed disk has no parity protection against additional failures, hence why RAID6 is very useful in ensuring at least some additional protection during rebuild of a single failed disk vs. RAID5.

Share this post


Link to post
Share on other sites

Stop right there-- that means there's more at play than just a failed drive, or something prior to the failed drive actually failing was causing issues.

And yes, a strong consideration for RAID5-- a RAID5 with a failed disk has no parity protection against additional failures, hence why RAID6 is very useful in ensuring at least some additional protection during rebuild of a single failed disk vs. RAID5.

The data that got corrupted was a folder that our IT person at the time was not backing up. So he had to run some data recovery software on an old back up to at least get some of the data back. That was the main reason for data corruption.

So are you endorsing RAID 6? Rebuild as most expect?

The data recovery companies' explanation included, that when there is an issue, the RAID controller automatically goes into "Degraded" state. The data recovery folks stated this to me before I checked the firmware of the RAID controller and there it stated "Degraded." Then when the controller rebuilds the RAID after a new drive replacement, it is rebuilding the array as Degraded. I believe they said something about the RAID 5 parity information was rebuilt on to different sectors than the failed HD. Once again, I haven't heard this before but there seems to be something to it.

I will add that the four disc hardware RAID 5 was internal on the Dell Server. I don't know if that makes it something different than a hardware rack NAS/SAN or not.

Thanks!

Share this post


Link to post
Share on other sites

Sounds like you are getting duped by that recovery company. Any RAID card with any disk group that has one fail will list "degraded"... it means its not in its normal optimal environment anymore. Has nothing to do with firmware or other issues, its just a warning to tell you a drive has bounced out of the group. If you have a hot-spare in that server or you load in a new drive it will automatically start to rebuild. During that rebuild process it will be "degraded" and once it completes it will go back into normal mode.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this