I don't have a solution yet. I'm not meaning to lead you on but to stimulate conversation so we can come up with the solution together.
Hard drive manufacturers are drawing a distinction between "desktop" grade and "enterprise" grade drives. The "desktop" grade drives can take a long time (~2 minutes) to respond when they find an error, which causes most RAID systems to label them as failed and drop them from the array. The solution provided by the manufacturers is for us to purchase the "enterprise" grade drives, at twice the cost, which report errors promptly enough so that this isn't a problem. This "enterprise" feature is called TLER, ERC, and CCTL.
There are three problems with this situation:
The first is that it flies in the face of the word Inexpensive in the acronym Redundant Arrays of Inexpensive Disks (RAID).
The second is that when a drive starts to fail, you want to know about it, as Miles Nordin wrote in a long thread:
The third is that other attributes of consumer grade drives are attractive, as r.g. wrote:
For a while, Western Digital released a program (WDTLER.EXE) that made it possible to enable TLER on desktop grade drives. This no longer works.
Quindor created a heroic thread that attempts to identify which exact drives on the market are compliant with the ATA standard and allow a software command to enable ERC temporarily. A problem with this, discussed in the thread, is how to verify that it works. Just because a drive tells you that ERC is enabled doesn't necessarily mean that it's true.
The best solution I've seen, described at length by qasdfdsaq in the same thread, is for the computer to compensate for the drive behavior like this:
Rather than relying on the drive to report an error within 7 seconds, or attempting to "fix" the drive so that it will, when any drive doesn't report back within 7 seconds, treat it as an error and cancel the operation.
What's nice about this solution is that it will work with any drive on the market.
We just need to figure out how to configure our controller or operating systems to behave this way. qasdfdsaq says that his Solaris system already does this by default.
According to SmallNetBuilder, the manufacturers of NAS boxes have already figured this out too:
These NAS boxes are all (to my knowledge) running Linux.
I run Linux and FreeBSD, so I'm interested in knowing how to configure those operating systems in such a way that I feel safe using "desktop" grade drives. Let's figure out the settings for all the common operating systems/controllers and post them here, so we can finally put this issue to rest and go back to using Inexpensive drives in our Redundant Arrays of Inexpensive Disks.