continuum, on 21 February 2013 - 05:04 PM, said:
Not quite... a parity or RAID1 setup will still have good data to rebuild a stripe affected by a bad sector. Technically speaking they do tolerate bad sectors.
But, can you provide at least one controller/firmware combination that actually does this? Until this date, no one has been able to confirm to me any actual product that uses redundancy in case of an unreadable sector. Only ZFS does that.
But you are right, it is at least theoretically possible. However, conventional RAID can never know whether the parity or mirrored copy is correct or stale. RAID can not distinguish between corrupt data and valid data. It lacks the facilities required to make that determination, such as checksums. It can only recognise that the data and parity are not in sync. Virtually all RAID controllers will rebuild the parity, blindly assuming that the data is good and the parity is bad.
Quote
Most RAID setups give drives only a short timeframe (a few seconds) to recover from sector errors
Well by convention, this timeout value cannot be less than 10 seconds. TLER is typically set at 7 seconds, to cope with even the most strict controllers that employ 10 second timeouts. If the harddrive did not provide the requested data within 10 seconds, it is detached and marked as failed. This pretty much means such RAIDs are extremely sensitive and virtually incompatible with modern disks that by design produce bad sectors due to insufficient ECC errorcorrection.
Quote
requiring the drive to go on and and mark the sector as bad. Proper drives will do this, desktop drives usually don't
Mark the sector as bad? You mean marking it as Current Pending Sector in the SMART output? All drives do this; the consumer drives simply spend more time on recovery before giving up, 120 seconds typically. Any good technology would be able to cope with this; as it is easy to send a reset command and go on with life. Only primitive firmware RAID systems appear to have problems with such kind of drives. Generally this means you need TLER drives for old-fashioned RAID controllers, while modern implementations of software RAID under Linux and BSD platforms as well as ZFS do not require special disks with TLER support and work just fine with casual consumer drives.
In fact, TLER feature can be dangerous and is nothing more than an ugly hack. Assume you have a RAID5 where one drive is completely failed. This means you run degraded - basically a RAID0. In this circumstance, where you lost your redundancy, you are at the mercy of bad sectors. It is extremely common to encounter these, during the rebuild of the RAID5 with a new disk. What happens is that one or more disk members will encounter bad sectors. If you have TLER disks and lose your redundancy, this pretty much means data corruption or even a failed array - as many controllers kick out disks with bad sectors even if they return I/O errors.
Without TLER, you will leave the recovery methods of the harddrive intact. This means that in degraded conditions you still have a last line of defence; which otherwise would have been killed by TLER.
Quote
a proper filesystem with built-in protection against things like this and bit-rot such as ZFS is something the rest of the world needs to hurry up and start using. *sigh*
It pleases me to read this. I have helped many people with broken RAIDs; hardware RAID like Areca and software RAIDs like Intel driverRAID. So many people lose their data due to incompetent software engineering. The whole TLER issue is just sad; basically an incompatibility between hardware and software. Even ordinary consumers deserve better protection for their data!