In this search and my experience over the last years, RAID type disks do not exist for nothing. These disks are often picked from batches, better constructed, heavier and more robust to withstand 24H duty and thrashing around the disk. Also they often have tuned firmware to deliver the best possible results in a RAID and not so much in a desktop. And they sometimes have a different warranty period, compared to a desktop disk.
Desktop disks have been traditionally a lot cheaper but not always safe to you use in a (hardware) RAID array. Sometimes problems would surface immediately, and sometimes only after a while. This lead me to find out what could be the main cause of this. In my opinion it has to do with TLER / CCTL. Everything else isn't really needed to make your RAID controllers understand the drive better.
Let me try to explain :
Which action should a HDD take in a situation where it has a hard time reading or writing something :
Desktop disk - try unlimited times until success
RAID disk - try 7 seconds max, then report back it failed to read the requested sector
What do Controllers expect the disk to do :
Desktop controller - Wait for the disk to respond, hangup and halt operations until success
RAID controller - Wait max 8 seconds (often default value) then declare disk bad and evict/fail disk from raid set
Ideal situation :
RAID Controller with RAID disks. A disk develops a fault, tries for 7 seconds, then tells the RAID controller that it failed to read a sector. RAID controller acknowledges this and rebuilds sector from it's RAID, writes data to a different sector on the disks and marks the original sector as bad. RAID controller happy, Disk happy, no dropping disks!
As you can see, mixing a desktop disk with a raid controller can produce unwanted results. Disks will be dropped, while nothing is really wrong with it, etc. In the end, it will become almost impossible to rebuild the RAID array anymore. Over time (when disks inevitably start to develop minor faults) this problem will surface and gradually become worse.
It became known
It first came to light that any disk can do either, and that it's just a value in the firmware when WD leaked a tool with which this could be set. They quickly disabled the functionality in their newer revisions of firmware and drives, to try to keep selling their much more expensive RAID editions (50% increase in price, nothing in cost, good deal for them). There is a good topic over at the HardOCP forum which work and which don't. WDTLER changed to setting in such a way that it would survive a reboot, actually altering the firmware itself.
But since the newest patches/SVN of smartmontools or smartctl, it's now possible to use SCT control to change the value the SCTECR (Error correction) is set to. Since this is done through SMART, any modern disk should support it. A big thank you goes out to r.gregory whom made this possible!
It's even now possible, using most modern RAID controllers, to use these commands on disks inside of a hardware RAID. And that's where it got interesting! Because if we can manually set the Error control values, we could ensure that at least this problem will not bite us again in the future!
There is only a slight problem, and that is that the set value will be reset on every power cycle. Reboots mostly do NOT affect the value. Thus, running Windows or Linux, this is no problem, just create a script which sets it upon boot. Myself I run VMware ESX or ESXi on my servers and it posed a slight problem. My solution is a custom Fedora 12 USB bootstick, which boots in about 30 seconds, sets the value and automatically reboots. Pull the stick out during the reboot and the system boots from HDD with VMware on it. Problem fixed.
So the only thing left now is to find out which disks are compatible with setting this command, and which are not.
So, how do we test this.....
Your controller does not matter (RAID controllers are bit more tricky and need a different approach though, see the smartmontools website).
Windows users, download the windows installer. This is the newest 5.40 build. This gives you the smartctl executable, which we'll need. Users of other OS flavors will need to build their version from SVN, since the current binary versions do not yet have this functionality in it.
Smartctl can be used for many other things besides this. It can tell you much about the life of your disk or say the current temperature for monitoring purposes, etc. etc.
None of these tests below will touch your data.
When you have that installed, and for example the disk you wish to check is your "d:" drive, execute the following :
smartctl -l scterc d:
If correct, this will give you the following feedback on a desktop disk
Now we are going to try and change that value :
SCT Error Recovery Control:
smartctl -l scterc,70,70 d:
If that works, you will see the following feedback :
To put it back to original values again, either turn off the power of your system, or run "smartctl -l scterc,0,0 d:"
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Interesting tests are also if the value survives a reboot or even a power-cycle (from what I understand, it should not).
Since I think this information is needed for anyone trying to build a home RAID array, we should keep a sort of database in this topic, maybe something like this:
Brand Type Type2 Size RPM Revision Firmware Available Default Reboot Powercycle Samsung HD154UI F2EG 1.5TB 5400 - 1AG01118 Yes Disabled Stay Lost Samsung HD203WI F3EG 2.0TB 5400 - 1AN10002 Yes Disabled Stay Lost Samsung HD103UJ F1 1.0TB 7200 - 1A001110 Yes Disabled Stay Lost
Brand Type Type2 Size RPM Revision Firmware Available Default Reboot Powercycle WD WD360GD 36GB 10000 00FNA0 35.06K35 No - - - WD WD10EADS Caviar Green 1.0TB 5400 00L5B1 01.01A01 Yes Disabled Stay Lost WD WD2500BJKT Scorpio Black 250GB 7200 ? 11.01A11 Yes Disabled Lost Lost
Brand Type Type2 Size RPM Revision Firmware Available Default Reboot Powercycle Seagate ST31500541AS 1.5TB 5900 - CC32 Yes Disabled Stay Lost Seagate ST31500341AS 7200.11 1.5TB 7200 - CC1H Yes Disabled Stay Lost Seagate ST31000333AS 7200.11 1.0TB 7200 PCB rev A? CC1F Yes Disabled Stay Lost Seagate ST3250310NS ES.2 250GB 7200 PCB rev A? SN04 Yes Enabled (6s) N/A N/A
Anyone willing to help and submit, much appreciated! Myself I would really like to know the results for the Samsung F3 2TB HD203WI EcoGreen disks!
Update: Thank you all for submitting! Keep them coming!
Edited by Quindor, 01 May 2010 - 05:42 PM.