Quindor

TLER / CCTL

Recommended Posts

I've been reading everything on here but cant seem to get it working on my side. I'm running Windows 7 x64 and Adaptec 5805 with 4 drives in Raid 5. I cant seem to run smartctl on the Raid Volume. Is there something I need to be doing to get it see the physical drives instead of just the logical one?

Any help would be really appreciated.

Share this post


Link to post
Share on other sites

Ah that makes sense, seeing that chart I dont see Adaptec even listed tough, not even for linux yet people on this thread are using it. I assume it is supported?

Share this post


Link to post
Share on other sites

Ah that makes sense, seeing that chart I dont see Adaptec even listed tough, not even for linux yet people on this thread are using it. I assume it is supported?

I am actually the one using the Adaptec controller. ;)

Since I use VMware ESXi 4.0u1 on my server, I also have no way of running smartctl from the OS I am using. For this reason I use a bootable USB stick with Fedora installed. Fedora and their adaptec drivers are configured in such a way that next to your data arrays, they also provide a "SG" of your disks connected to the adaptec controller, a feature called "expose physical drives".

With that, you can access the SMART data as if the disks are connected to a simple controller and also preform actions on them such as the Smartctl SCTERC commands. And off course read other stuff out of the disks. I also use this to run my Adaptec ASM software, since there is no version for VMware ESXi. Works great, if you do not have to powerdown your server too much. It takes about 5 mins in total to boot from the stick, set the desired commands and reboot into VMware ESXi.

I know of no other way to get around this using windows, or even any other linux distribution for that matter. Normally I'm more prone to use Debian or Ubuntu, but I could not get the switch to work for me. Not a linux expert though.

Hope this helps you, let us know!

Share this post


Link to post
Share on other sites

Hey all,

This is a little off topic but still relevant to this thread.

Has anyone got any recommendations for SATA RAID cards suitable for Windows 2003 suitablt for running desktop drives in a RAID array. I'm getting annoyed at the built in motherboard controller.

For the OS, all I'm looking to do is set up a 2 disk RAID 1 mirrored array. The disks I use here are WD RAID suitable drives so won't require the TLER/CCTL fix mentioned here. However, I currently have a second 3 disk RAID 5 array for data in the same server that only utilise Samsung desktop HDD's, and I'm not sure if I will put another dive into this array.

The motherboard I use currently has 2 PCI-E x1 and 1 PCI-E x16 slots free (oh, and a PCI-X slot, although that seesm to be out of favour now). This is just a home brew server so costs can't run away.

the other thing I'm not sure about is whether to junk this idea and utilise a NAS for the data RAID. Can the TLER/CCTL tick be performend on such devices, if so has anyone got any suggestions.

Cheers.

Edited by swinster

Share this post


Link to post
Share on other sites

I am actually the one using the Adaptec controller. ;)

Since I use VMware ESXi 4.0u1 on my server, I also have no way of running smartctl from the OS I am using. For this reason I use a bootable USB stick with Fedora installed. Fedora and their adaptec drivers are configured in such a way that next to your data arrays, they also provide a "SG" of your disks connected to the adaptec controller, a feature called "expose physical drives".

Hi Quindor,

That sounded so easy that I had to give it a try. Unfortunatley for me things didnt go as planned. I downloaded fedora13 live and ran it on my server with the raid, it just hangs as it finishes booting no error msg or anything(if i unplug the adaptec 5805 it boots fine). I tried F12 different problem it just spits an error and wont boot. So i tried ubuntu, that seemed to work first time but it doesnt seem to expose the drives so smartmontools doesnt seem able to work, there is supposed to be a way to set the drives to be exposed but since I am complete noob at linux i have no idea where. So as it stands i'm trying to figure out why fedora 13 wont boot what ver are you using?

Edited by xtinct

Share this post


Link to post
Share on other sites

I'm sort of in the same boat as I'm using a hardware RAID card with a Windows OS. This means I will have to run a Linux version of the tool (for hardware RAID support) and then reboot before loading Windows.

_________________________________________________________________

Share this post


Link to post
Share on other sites

Thank you for this fantastic post. It felt like godsend as HDDs are constantly dropping off my RAID array -- but unfortunately it seems I got my hopes up too soon, as I'm unable to set ERC on my system. Here's some details, in case someone wants to try to lend a helping hand:

OS: Windows Vista Ultimate SP2 64-bit

RAID adapter: 3ware/AMCC 9500S-8

RAID firmware: FE9X 2.08.00.009

RAID driver: 3.00.04.070

HDDs in RAID: 1 x Samsung HD154UI (Spare), 1 x WD WD15EADS, 2 x Seagate ST31500341AS

I can actually get information for each drive by entering "sdb,[0-3]" as target device. Smartctl reports that all drives exist in smartctl database, and all drives support SMART and it's enabled.

But when I try to change the scterc parameters, I get an error message:

Warning: device does not support SCT (Set) Error Recovery Control command

Suggest common arguments: scterc,70,70 to enable ERC or sct,0,0 to disable

That first line of text is also returned when I try to read the ERC status of any HDD.

I also tried accessing the HDDs via "tw_cli/c0/p[0-3]" but while it returns some information, it seems much more limited in its capabilities than the method described above.

I'm out of ideas here. Any help would be much appreciated.

cheers,

:: petri

Edited by petri_t

Share this post


Link to post
Share on other sites

Yay, super-long post time.

DD on solaris with HDAT2 would seem like a good testing method. DD for windows should be able to preform the same thing and is what I would recommend.

Using R-studio though, you used a tool which is used for data recovery. I am not sure if it doesn't invalidate those results. Although it probably uses the windows drivers, you never know with those tools what kind of thresholds they use to read the data from the disk. The tool knows what a bad sector is and will try to read it vigorously in my opinion. So not quite sure about that. But as said, DD would indeed be a good way of testing!

When/if I find a disk with a bad sector I'll give the DD method a try and see how that affects things. Thnx's!

R-Studio uses the NT raw block device to read from disk the same as DD does. It however is more useful in logging errors as it returns the error codes and exact sector numbers and date + time a bad sector occurred at, whereas DD just panics and exits at the first bad sector. It has specifiable retry levels for reading bad sectors that you can change, or you can tell it not to retry at all. It will still log it every time it happens (e.g. [Time:date] Read drive X at sector Y failed after Z retries).

Incidentally I've been running daily scrubs and surface tests on a batch of new drives that keep throwing up recovered bad reads and was just about to fix them when this thread came up. And I figured why not test out whether ERC actually works or not on these Samsung drives that are proving to be so popular? So anyway I'm just about to run some tests...

Todays update:

I tested the Samsung on an ICH9 chip and was (like you said) in fact able to set CCTL, it did survive a reboot, but was disabled again at a cold boot.

I also tested the WD on the LSI1068E, I was, unlike on the ICH9, not able to perform the setting with the same error as when I tried the Samsungs. I tried all the -d options, but none worked, so this confirms this as a controller dependant issue.

I also checked the controllers firmware and I found some settings which seemed interresting to me, though i'm absolutely unsure if they do the same thing, they're in the "Advanced Device Properties" dialog window, the settings are:

IO Timeout for Block Devices: 10
IO Timeout for Block Devices (Removable): 10
IO Timeout for Sequential Devices: 10
IO Timeout for Other Devices: 10

I'm wondering if this setting is the same thing as the ERC setting, but for the whole controller. I'm able to set them to 7 seconds, but i'm absolutely unsure if this is the same thing, they kinda sound the same. If you know of a way to create a 'failure' let me know, so I can test it.

Sadly I know of no way to reproducible test this.

Those values you found though definitely have something to do with how your controller handles a disk that is not responding. Personally, I would keep those values at their default 10 seconds and set your disks to 7 or 8, to give each device enough time to respond to the other.

Interesting stuff!

If your unable to configure the drives it might be useful to set the value very high, to prevent a disk that hangs for 20 seconds but then succeeds from being kicked out of the array. Not really what we wish (want the intelligent situation where the controller determines what to do), but better then nothing!

Yes, these are controller timeouts for when to drop a drive from the array if it stops responding. Ironically its settings like these that make the whole concept of ERC silly and unnecessary. The settings default to 15 seconds on my controller, and there is a perfectly reproducible way of testing it. Read from drive, unplug drive, time how long it takes before the system reports "Drive gone" errors.

Incidentally I have the same controller and am just about to test setting ERC on it to see if it works.

Hi Quindor,

I am looking to put 3 new desktop HDD's (Samsung 1.5TB F3EG) into a RAID array and as such will need to alter the TLER of the drives. Am I right in thinking that I need to so this on the 3 individual drive BEFORE I have put them in the array? Had a quick look over the SmartMonTools site that suggested some drive could be accessed through the RAID control in certain circumstance, but there is only limited support for specific controllers.

I'm going to be using the Motherboards on-board RAID controller (either NVidia or Marvel controller on an Asus WS PRO) which is probably considered a bit Naff, but I think it will do me.

PS. is there anyway this thread can be made sticky as it provides some great info for people looking to create a cheap RAID array form Desktop HDDs

Cheers

AFAIK you have to set TLER on the drives every time you start your computer. You also have to disable the RAID to do this, set it, then re-enable RAID and THEN start your OS. However motherboard controllers are actually better than standalone ones in this respect, because it's much easier just to turn RAID on or off completely in the BOIS, making the drives display as individual disks instead of a single logical volume on which you cannot access or change SMART/SCT data.

Hey all,

This is a little off topic but still relevant to this thread.

Has anyone got any recommendations for SATA RAID cards suitable for Windows 2003 suitablt for running desktop drives in a RAID array. I'm getting annoyed at the built in motherboard controller.

For the OS, all I'm looking to do is set up a 2 disk RAID 1 mirrored array. The disks I use here are WD RAID suitable drives so won't require the TLER/CCTL fix mentioned here. However, I currently have a second 3 disk RAID 5 array for data in the same server that only utilise Samsung desktop HDD's, and I'm not sure if I will put another dive into this array.

The motherboard I use currently has 2 PCI-E x1 and 1 PCI-E x16 slots free (oh, and a PCI-X slot, although that seesm to be out of favour now). This is just a home brew server so costs can't run away.

the other thing I'm not sure about is whether to junk this idea and utilise a NAS for the data RAID. Can the TLER/CCTL tick be performend on such devices, if so has anyone got any suggestions.

Cheers.

The motherboard's built in controller is probably better than any standalone controller you can buy for less than $200. I would however not recommend any RAID card for RAID-1 as Windows' built-in RAID works perfectly well. In fact it works better than most hardware RAID cards, and it's portable, and hardware-independent.

I'm sort of in the same boat as I'm using a hardware RAID card with a Windows OS. This means I will have to run a Linux version of the tool (for hardware RAID support) and then reboot before loading Windows.

Or, use Windows' built-in software RAID and not have this problem at all.

ZFS Plug

Ultimately though, most of the problems discussed in this thread wouldn't exist if everyone just used ZFS. ZFS fixes all your problems. Which is why I've switched to ZFS. It's also why I can test bad sectors, yank out disks entirely, set, retrieve, and modify SMART/SCT settings without rebooting or reconfiguring the machine, and even modify or zero individual sectors on individual disks on my RAID array to test out TLER while the system is running without having to worry about my data.

Also to note however is that in the rare case of multiple drive failure exceeding your parity level ZFS has a strong suicidal self-destructive tendency and may end up destroying your data rather than protecting it.

Rant

And finally, just ranting my personal opinion here. If a drive drops from your array because it can't read a sector, that means its BAD. You shouldn't be making excuses or using workarounds or firmware/TLER/ERC "tricks" or "optimizations" to get around the fact, so that you can keep getting away with using BAD drives. I had to learn this the hard way. If a drive goes bad you should replace it, not keep using it for as long as you can and ignoring the problems. I shall refer you back to Google's findings across over a hundred thousand monitored drives. Nearly a third of drives fail 8 months of their first scan error. With older drives this increases to as much as 40% within 6 months. A similar trend is seen for reallocated sectors. If your drive has an uncorrectable or reallocated sector, it is over FOURTEEN TIMES (14x) more likely to fail within 60 days than a healthy drive. You should replace it now, not cover it up with TLER trickery. If you have another bad sector on another drive during a rebuild, it's better to have it now while most of the first drive is still readable and there is some hope of recovery, rather than later when the whole thing dies. Also, if your array is big enough for this to be a problem, you shouldn't be using a single mirror or parity in the first place, you should be using double-parity or higher. RAID-5 was considered inadequate fifteen years ago. Also, ZFS fixes all your problems, see above.

And also to clear something up. There is no such thing as "certified for RAID" or "works with RAID" or "RAID suitable". The whole point of RAID, the original definition of RAID, and the reason it exists in the first place, is that you can use ANY (read: cheap, desktop) drives for RAID. Redundant array of inexpensive disks. Sure, this definition has changed now to reflect the fact it can be combined with expensive enterprise drives, but there's nothing to say you must use expensive "RAID edition" drives in RAID. There is no such thing as a drive that cannot be used in RAID. Once again, just to quell all this misinformation, any drive that more than one of exists can be used in RAID. This is the whole point of RAID in the first place. All this "RAID edition" nonsense is just marketing crap. Sure you have enterprise grade drives designed for higher duty cycles and higher vibration tolerance but this has nothing to do with RAID. Stick a hundred individual drives in an enclosure and you get the same vibration, heat, and power-on times whether you use RAID or not. RAID has nothing to do with it. You do not need special drives to do RAID. Also Hitachi are now specifying all their desktop drives to have 24/7 availability, partly in order to quell all the bad press and misinterpretation that came from them once stating 8 hours a day as a basis for their reliability estimates.

Adapter settings like the above LSI one negate the need for ERC/TLER control completely, and furthermore any intelligent adapter wouldn't even need this setting. We should not be excusing or working around the fact that many adapters are stupid. Get the manufacturers to produce proper, sensible RAID cards, or better, use ZFS. Again, this whole TLER/CCER/ERC thing is just a band-aid on a problem that shouldn't exist in the first place. And you shouldn't (read: don't) need it to use a drive in RAID.

Edited by qasdfdsaq

Share this post


Link to post
Share on other sites

Rant

And finally, just ranting my personal opinion here. If a drive drops from your array because it can't read a sector, that means its BAD. You shouldn't be making excuses or using workarounds or firmware/TLER/ERC "tricks" or "optimizations" to get around the fact, so that you can keep getting away with using BAD drives. I had to learn this the hard way. If a drive goes bad you should replace it, not keep using it for as long as you can and ignoring the problems. I shall refer you back to Google's findings across over a hundred thousand monitored drives. Nearly a third of drives fail 8 months of their first scan error. With older drives this increases to as much as 40% within 6 months. A similar trend is seen for reallocated sectors. If your drive has an uncorrectable or reallocated sector, it is over FOURTEEN TIMES (14x) more likely to fail within 60 days than a healthy drive. You should replace it now, not cover it up with TLER trickery. If you have another bad sector on another drive during a rebuild, it's better to have it now while most of the first drive is still readable and there is some hope of recovery, rather than later when the whole thing dies. Also, if your array is big enough for this to be a problem, you shouldn't be using a single mirror or parity in the first place, you should be using double-parity or higher. RAID-5 was considered inadequate fifteen years ago. Also, ZFS fixes all your problems, see above.

Here is my own personal rant as well: Actually there is a very good reason for TLER "tricks" and that is to fix the problem exactly as you describe. Lets say you have 10 drives running happily in a RAID6. These drives do not have TLER set to reasonable values and you have a RAID card expecting timely responses from your drives. Lets now say you have 1 drive have 1 bad sector suddenly. The card waits... and waits... and then drops the drive because it didn't respond fast enough. No big deal your trucking a RAID6 so your down to a RAID5 and a small performance hit. 15 minutes later a second drive develops a bad sector and is kicked out of your array... uh oh your down to an unprotected RAID. You get 2 new drives and swap out the (now failing) drives and start a raid rebuild... 15% in disk #3 develops 1 bad sector... is kicked out of the array. There went all your data.

If only you had set a reasonable TLER so the disks stayed in the array, while they were failing in 8 months or so you had plenty of time to fix each of the failing disks without harming your data, but you chose to ignore TLER and instead lost 6TB of hard scientific data you had been collecting for a project.

My point is it isn't a trick, it is a real world problem with a real solution. TLER for a desktop drive being high makes a lot of sense, you have no redundancy and the disk should try as hard as it can to get to that data before giving up. If you have a RAID where you have redundancy (1, 5, 6) then a single bad sector should not lock up the entire array trying to get 1 piece of most likely now bad data. Instead the controller opts to save on performance and the disk should just give up relatively quickly and the controller will fill in the missing data and relocate the bad sector data on its own.

Share this post


Link to post
Share on other sites

If a single bad sector locks up an entire array, you have a stupid controller. In the scenario you describe:

These drives do not have TLER set to reasonable values and you have a RAID card expecting timely responses from your drives. Lets now say you have 1 drive have 1 bad sector suddenly. The card waits... and waits... and then drops the drive because it didn't respond fast enough.

1) The card should have a setting to specify how long it waits.

2) Cards without the setting should wait, wait, then abort the read, read the data from elsewhere, and overwrite the bad sector with regenerated data from the other drives.

3) What a card "expects" and what is "timely" are part of the card's design. We should not reinforce bad design.

4) A card can detect if a connected drive is not responding, or if said drive has been disconnected.

A card does not drop a drive "because it does not respond fast enough". A card drops a drive because it chooses to, because the makers programmed the behaviour into it after a specified timeout. This is BAD design. However, that said, if you had set TLER 8 months ago, then the drive wouldn't have dropped, your array would have kept running, you would not have noticed, and now you have bad sectors on two other drives, your first drive fails completely. With TLER set you hid the problem of the failing drive and didn't notice it until it was too late.

As I've said, if you have a controller that behaves properly, or use ZFS, you wouldn't need a TLER "hack". The controller chooses to mark a drive as bad, and drop it, not the other way round. Decent controllers have this property tunable so we don't need hacks like TLER. Even better controllers don't even bother waiting. They simultaneously fetch the requisite data from elsewhere after a few hundred milliseconds and continue functioning as normal, marking the drive as bad if sufficient errors occur. The best controllers will abort the command, query the drive's status, and then issue a reset; then, and only then if there is still no response from the drive should it be dropped.

Once again, controllers dropping a drop do so as a design feature. They choose to drop a bad drive to tell you it's bad so you can replace it. Dropping a whole drive because of one bad sector might be bad design - as I've said the controller has the option of abort, query, or reset before giving up - but if I didn't like it I'd get a better controller rather than more expensive drives just to hide the problem.

Edited by qasdfdsaq

Share this post


Link to post
Share on other sites

I think you have some interesting arguments there, but I still disagree with your idea that somehow this is a controller only problem and not a drive issue. I think we actually agree on the final point though, the point is you should NOT need more expensive drives just to get a feature that should be in the drives to begin with but manufacturers are either disabling or ignoring to try and force people to buy more expensive drives. I also agree that for a regular user with a small array that having a controller that delays and waits on drives or could be configured to do so is an acceptable solution, but just because it works for a some doesn't mean this isn't a real problem for others and "ignoring it" isn't a possible solution when real data is on the line.

My take on the same arguments is basically like this:

Re: 1) Most controllers do, and I want mine set at 7 seconds so the array doesn't get locked up during heavy writes and begins dropping data when the write cache is consumed. (I have had this happen before in a high performance system... it isn't pretty)

Re: 2) There isn't "elsewhere" to read the data from. RAID reads data in in stripes across all drives participating in the array, this is so reliability of data can be confirmed as well as having the necessary data should it be altered and parity recalculated. The drive needs to complete before the card can continue to do its calculations, this is what locks up the controller waiting for a response.

Re: 3) We should reinforce choice in design, what you believe to be correct is apparently what I consider completely incorrect behavior. Good RAID cards allow me to specify what I want, the hard drive manufactures should not be dictating what terms I use their drives on.

Re: 4) Actually there is a big difference between not responding and disconnected, not responding takes several attempts at communication and confirming the drive appears to be completely unresponsive. At that point it is considered the same as disconnected. Disconnected events can be tripped by the hot plug notification system instantly and the controller can cope with that much faster.

In fact many cards do drop a drive because it doesn't respond fast enough. It took too long to respond (and some drives don't respond while they are doing something like attempting error recovery on a sector, exactly what TLER is supposed to fix) so its assumed the drive has lost power, been unplugged, the firmware locked up, any known host of problems. For data integrity the drive is knocked off the array. Also bad sector relocations, while automatic on the controller, absolutely send notifications of the problem encountered and it is in no way "hidden". This is precisely because drives developing problems reading/writing are very possibly going to fail soon, so the card can keep you apprised of this (obviously through some userland utilities). TLER simply allows the card to recover gracefully, timely, and make a note of the problem for someone to fix.

On top of all that you shouldn't need a "more expensive" drive to fix the TLER problem. It is part of the ATA-8 spec, as far as I am concerned the drives claim ATA-8 compliance they should implement it. The "RAID Enabled" drives are a bunch of bull though, that is really the entire point here. Western Digital and others have tried to mark up the drive 50% when basically the only thing they change is effectively enabling the ability to control TLER. Supposedly they pick better batches and such too, but I am somewhat skeptical of any difference between the drives except for firmware locked features. I shouldn't need a new controller or more expensive disks, as far as I am concerned the controllers are doing everything the correct way and the drives should be obeying it, not the other way around.

In terms of ZFS, until it supports resizable arrays it isn't an option for most of my projects which periodically require reallocating resources between machines.

Share this post


Link to post
Share on other sites

While I do agree on some points too, I'd say there's a few fundamental errors in your argument:

Re 1) If your card has the setting, then you don't need TLER as you can set the timeout to be higher than the time the drive takes to attempt error recovery.

Re 2) The idea of a redundant array is that there is elsewhere to read the data from. Most controllers ignore the parity data on reads and do *not* verify it's integrity as you say unless you specifically request an array scrub. Even if it did, consider that the controller has no way of knowing which data is "correct". In a RAID-5 array, with single parity, if one drive doesn't match up, the controller has no way of knowing which of the two versions of the data is "correct" (this is one of the reasons I advocate ZFS, because it can), so there's no benefit to waiting. With RAID-6 you can make an educated guess based on two out of three, but you still cannot be 100% certain; but if the remaining two work while a third drive is hung, there's also no benefit to waiting (you've already verified the data integrity the best you can).

Re 3) Agree here, I should be able to choose what the controller does. However, I still believe the *controller* is in a better position to make decisions as to how long a drive should retry, not the drive itself. A controller *knows* if a drive is part of an array, the drive has no way of knowing. The controller knows if the data can be retrieved from another drive successfully via parity, and cancel the hung command; the drive has no way of knowing this. The controller knows if the array or other devices are busy and it would be faster to retrieve the data from parity and use the bad drive to service other requests, the drive has no way of knowing this. The controller knows if the array is quiet, and there would be no performance impact of letting the bad drive continue to retry for 30 seconds+, the drive has no way of knowing this.

Re 4) Exactly. If it's not disconnected, the controller should continue trying to communicate rather than dropping it. With NCQ being pretty much mandatory, all drives are able to service multiple requests at a time in a non-blocking manner, so attempting error-recovery should never block the whole drive from responding to status requests. Even when it does stop responding to every other request, a reset command usually still works (in my experience). But here the problem is if a drive stops responding to every request while attempting error recovery, that's bad drive design, and TLER is just hiding the problem (again). But again the controller can decide what to do better than the drive, according to array load, drive history, and most importantly, whether the data is redundantly available elsewhere.

Re: "In fact many cards do drop a drive because it doesn't respond fast enough. It took too long to respond (and some drives don't respond while they are doing something like attempting error recovery on a sector, exactly what TLER is supposed to fix) so its assumed the drive has lost power, been unplugged, the firmware locked up, any known host of problems."

The controller can immediately and unambiguously detect if a drive's lost power or been unplugged, it knows this with certainty as soon as it happens, so a controller should not confuse a drive that's not responding while still connected (error recovery hung) with one that is not responding because it's been disconnected - it should be able to tell the difference straight away.

As for your point about reporting, well, yes, to some extent. A drive being dropped however is probably going to be reported more prominently and widely than one uncorrectable sector (although again this is dependant on controller and software design) and while you might consider this either a good or bad thing, but smaller issues such as an individual uncorrectable sector is more likely to be not reported at all - and if they are reported they are more likely to be ignored by admins. I know for one thing that on Solaris, unreadable sectors on drives aren't reported by the FMA (Fault Management) system as faults, so I have a manual script that actively scans and reports them, whereas a dropped disk gets mailed to all admins and console immediately.

Now ERC may or may not be part of the ATA8 spec (I haven't checked) but lots of specs or parts of them are optional, and not all drives are even ATA8; some ATA7 drives supported SCT-ERC and in fact, even on ATA8 drives most of them do not have ATA8 compliance anywhere in their marketing or datasheets. But indeed, all drives should be ATA8 by now and should conform properly, if they don't its bad design. While I don't disapprove of higher-grade drives for use in enterprise environments with better vibration and heat tolerance, charging 50% extra as you say for TLER just to comply with ATA8 spec is stupid.

[Edit] All the info I can find about ATA8 indicates it's still draft, or at least I can't find the final spec. Maybe that's why manufacturers get away with implementing it badly? The drives I have that advertise ERC but fail to implement it correctly identify as ATA8-unspecified draft whereas the ones that work are ATA8 3b through 3f. The newest drives I have (2 weeks old) still identify as ATA8-ACS revision 4, which is a 3 year old draft...

As for ZFS, the whole idea is everything is one giant storage pool, which can be expanded but not shrunk. There is no concept of partitions or individual arrays, every machine can use as much or as little space on the pool as it wants, up to the entire pool or none at all. Unless of course you're referring to pulling individual disks from one machine and replacing them with smaller ones, then things get difficult...

Edited by qasdfdsaq

Share this post


Link to post
Share on other sites

The card I have in one of mine does allow me to adjust the setting it has to wait, and I want it to be short. My reasoning is simple, I want the drives to very quickly (< 5 seconds) abort with an error if they can't read/write a sector. I don't care if they can successfully read this sector after 20 seconds or not, the fact it is taking longer than 5 indicates something is wrong and that sector needs to be removed immediately, before problems arise. As you pointed out the controller doesn't know which block of data is bad if something doesn't add up, so I want the drive that is having problems to report so and move on, the controller will take care of fixing the data and it won't be ambiguous which block needs replaced. If you don't move on and let it just read and read it may very well finally read data, but it is likely corrupted anyways. Now we have exactly the situation you described, 3 blocks that don't add up and the controller is clueless as to which one it is.

TLER lets me tell the drive, give it a few seconds in case something is weird, but abandon all hope very soon if it isn't working. Honestly having a controller babysit all drives and all commands in the NCQ and keep timing information on all pending reads/writes sounds great in practice an is just horrible in reality. The drives are not just spinning disks that hold data, they have on board electronics to take care of this kind of stuff themselves and they should be doing it. TLER is nothing more than a configuration variable, saying drives shouldn't support it is like saying controllers shouldn't have a configuration to control how long they wait for a drive. You obviously wouldn't argue that last point so I am curious as to why you seem to be pushing the first point.

Also I would point out, I have a system where if I had to adjust the controller to wait 30 seconds because a bad sector developed in a drive would cause data loss. The RAID controller only has 1GB of on board write buffer and the CCD image capture that pumps live data onto this system produces data at about 70MB/s - that's 15 seconds of buffer space if the card is awaiting a drive because its having trouble writing a sector. Obviously this system is actually using expensive enterprise equipment and this isn't an issue at all since everything fully supports this kind of setup, but it is a valid example of situations where this could be a problem and shouldn't be.

You are correct in that T13 has yet to publish all of ATA-8 (it is in a series of documents for different parts of the spec, some published and some working draft).

Also I do like ZFS design, I was simply pointing out why I can't happen to use it in some cases. There are concepts of arrays and partitions in ZFS, it is only the top layer that hides all these behind the zpools. The level I am talking about is the actual vdevs being resized or reshaped, not the entire storage zpool. For now what I actually use its btrfs on top of a RAID6, I get the best of both worlds with this setup and someday btrfs should get the same kind of raidz like support to allow it to even handle software raid. Though one thing I am looking forward to in ZFS for one of my systems is raidz3, nothing else has triple parity so it will be the first I believe.

Share this post


Link to post
Share on other sites

The card I have in one of mine does allow me to adjust the setting it has to wait, and I want it to be short. My reasoning is simple, I want the drives to very quickly (< 5 seconds) abort with an error if they can't read/write a sector. I don't care if they can successfully read this sector after 20 seconds or not, the fact it is taking longer than 5 indicates something is wrong and that sector needs to be removed immediately, before problems arise.

My point exactly. Except that I believe the controller can dynamically make a decision as to what length of time is appropriate. What I'm saying is a controller should time out a command after e.g. 1-5 seconds if appropriate (because the data is available elsewhere) and take longer if it's not. Rather than timing out the entire drive after e.g. 30 seconds or timing out every command after 5 seconds TLER style. With your solution, you're hinting at the drive that >5 seconds is completely inappropriate in all circumstances, and you don't have the option of trying longer if you have to. With controller logic, the controller dynamically switches in real time so you don't have to permanently set either setting.

As you pointed out the controller doesn't know which block of data is bad if something doesn't add up, so I want the drive that is having problems to report so and move on, the controller will take care of fixing the data and it won't be ambiguous which block needs replaced. If you don't move on and let it just read and read it may very well finally read data, but it is likely corrupted anyways.

Well the whole idea of error recovery and retry is that the drive is refusing to return bad/corrupt data (hence why the error returned on a bad sector is usually data error/CRC error). The drive can read the data, it just doesn't match it's internal CRC for the sector and its refusing to return corrupt data so returns none at all. During retry it just reads it over and over until it gets a reading that does match the CRC, at which point it returns the correct data.

TLER lets me tell the drive, give it a few seconds in case something is weird, but abandon all hope very soon if it isn't working. Honestly having a controller babysit all drives and all commands in the NCQ and keep timing information on all pending reads/writes sounds great in practice an is just horrible in reality.

Controllers already do this anyway. How else does it decide to drop a drive if it hasn't responded to the last read request for ~10-30 seconds? How else does it time out anything or return an error to the system for that matter? Or load-balance I/Os? Or respond to buffer flush commands? Oh, and lets not forget NCQ... The difference is an intelligent controller dynamically times out bad I/Os quickly and seamlessly, allowing the drive to continue operating without needing TLER, dumb controllers time out the entire drive and drop the whole thing without even trying other options first.

The drives are not just spinning disks that hold data, they have on board electronics to take care of this kind of stuff themselves and they should be doing it.

The whole reason we're discussing TLER is because the on-board electronics (in default mode) are not appropriate to take care of this kind of stuff because they're normally set to assume they're being operated outside a RAID array. The whole reason RAID controllers exist is to control and co-ordinate multiple drives, in a way the drives themselves cannot do. Again, the controller knows if the data can be retrieved from parity, and automatically cancel the pending IO after 1-5 seconds when appropriate, whereas TLER forces the drive to do it after a fixed time always even when inappropriate. The drive knows best what time and type of error recovery is optimal for it's media. The controller knows best what time and type of error recovery is optimal for the array and system as a whole. Good host-drive communication allows the the host logic to augment the drive logic and for the best option to be chosen dynamically in all circumstances. Neither side should need be completely ignored or overridden with arbitrary values because board designers can't be bothered implementing decent algorithms.

It's like back in the old days when a divide by zero would cause an entire system to panic and crash. Or even now, where a graphics card failure can cause a headless system to fail (fixed and no longer occurs most decent O/S's thankfully, but not Windows). We shouldn't need to put up with it. The whole defeatist attitude of "it's too complicated" and/or "lets use crude workarounds and ignore the problem" is what hinders progress. Like I've said, good controllers already implement a vast majority of these features in some combination or others. The best do all of them optimally. Most however just panic and drop a drive at the first sign of trouble.

TLER is nothing more than a configuration variable, saying drives shouldn't support it is like saying controllers shouldn't have a configuration to control how long they wait for a drive. You obviously wouldn't argue that last point so I am curious as to why you seem to be pushing the first point.

I don't argue drives shouldn't support it, I argue that it shouldn't be necessary with good controller logic. In an ideal world, all controllers wouldn't need the setting either as it would dynamically tune the timeouts to appropriate levels in real-time. Some are already intelligent enough to not let a bad drive block I/O for the entire array even while waiting.

Also I would point out, I have a system where if I had to adjust the controller to wait 30 seconds because a bad sector developed in a drive would cause data loss. The RAID controller only has 1GB of on board write buffer and the CCD image capture that pumps live data onto this system produces data at about 70MB/s - that's 15 seconds of buffer space if the card is awaiting a drive because its having trouble writing a sector.

Barring the issue that write errors are far less common than read errors (I've not had a write error on a drive that wasn't already minutes away from complete death), my proposal - decent controller logic - would still work better in this scenario. Solution one (poor), the controller doesn't block, and continues writing data to the good drives while waiting on the bad one so your buffer doesn't fill up. Solution two (good), the controller doesn't need to wait 30 seconds as it knows there's incoming data, and cancels the request after 10 seconds. If you happen to have a load burst where you're writing data at 3x the normal rate, the controller can cancel after 3 seconds instead. Your TLER workaround wouldn't be able to cope with this, whereas controller logic can. Also if your incoming data rate is lower, perhaps due to little movement in your source images, the controller could wait longer, up to 30 seconds if buffer space allows. Your TLER workaround wouldn't be able to cope with this. Again however, if a drive is having write errors, in my experience, you should chuck it now.

Also I do like ZFS design, I was simply pointing out why I can't happen to use it in some cases. There are concepts of arrays and partitions in ZFS, it is only the top layer that hides all these behind the zpools. The level I am talking about is the actual vdevs being resized or reshaped, not the entire storage zpool. For now what I actually use its btrfs on top of a RAID6, I get the best of both worlds with this setup and someday btrfs should get the same kind of raidz like support to allow it to even handle software raid. Though one thing I am looking forward to in ZFS for one of my systems is raidz3, nothing else has triple parity so it will be the first I believe.

True. My main irritation about ZFS is it's design philosophy - "it will never fail" - so if it does, you're SOOL. Btrfs was designed from the ground up to have maximum recoverability if it does fail whereas ZFS is more like "we won't even bother with recovery tools because we believe it will never break". I'd personally be using Btrfs for everything if it supported parity-RAID instead of just mirroring; the problem with using it on top of a hardware array is it has no direct access to the underlying hardware to determine which block on which disk is bad in the event of silent corruption. And you lose all the benefits of direct access to the underlying drives, such as hardware portability and direct SMART data gathering. I do use raidz3 on ZFS myself, and the performance isn't bad, but I'd still be using Btrfs for the same if it did support it. ZFS is just at a far more mature development stage than Btrfs, but hopefully once I build my next storage server Btrfs will be ready with triple-parity RAID built-in.

Edited by qasdfdsaq

Share this post


Link to post
Share on other sites

Just a few clarifications:

SCT ERC is part of the ATA Standards, as I believe since ATA-6. Drives that "say" they support some ATA standard should support erc, but they may indicate differently. The ATA-8-ACS states that ERC-settings shall be volatile over Power-On-Reset of the device, meaning: once you cut power to the drive, the setting gets lost. If you reboot the computer where the drive is housed in, it depends on the behaviour of board and controller. If they just issue a soft reset, the setting shall prevail. If they cut power upon reset, the setting gets lost.

HD203WI are good drives, supporting ERC well. But in fact I came over this thread searching for a way to make the setting non-volatile. Which obviously does not exist, yet.

To turn on ERC, you'll have to send an ATA-command to the drive:

SMART WRITE LOG, Log Page 0xe0 (command log), the log page needs to contain the correct setting according to ATA-8-ACS

You'll have to send one command for each: read and write ERC timeout. I've implemented this successfully, meaning the drives report the settings I gave them afterwards. But I'm sorry I can't leak the code unless my lunch is paid for the next 10 years or so.

Notes on WD: (WD10EADS, WD15EADS, WD20EADS): the first versions (Nov. 09) with disabled TLER were badly firmware-coded. They indicated to support ERC settings, but always reportet "command aborted" - that way behaving very very very badly. The follow-up firmware just reports no support for SCT-ERC, which is the right way to go (if you get on that trail anyhow...).

Hope that helps understanding it a bit,

St0fF

P.S.: for 3ware 9650SE and 9690SA we setup the drives to have 7s/12s. That way it's not too slow on reading and not too aggressive on writing to bad sectors.

Share this post


Link to post
Share on other sites

Well, just to say now I have a reproducible bad sector, SCT error recovery control does not work on Samsung HD154UI's, under Windows or Solaris with different controllers, at least when directly accessing the device.

The value can be set under Windows but is not obeyed (ICH10R) even though it is retained by the drive. The drive's error recovery timeout remains two minutes(!) whatever you set SCT error recovery to

C:\Program Files (x86)\smartmontools\bin>smartctl -l scterc hdj
smartctl 5.40 2010-02-10 r3065 [i686-pc-mingw32-win7(64)] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

SCT Error Recovery Control:
          Read:      1 (0.1 seconds)
         Write:      1 (0.1 seconds)

D:\>erc

D:\>time  0<nl.txt
The current time is: 16:03:43.31 <=== 
Enter the new time:

D:\>dd if=\\?\Device\Harddisk9\Partition0 of=temp.bin skip=2231390009 count=1
rawwrite dd for windows version 0.5.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by the GPL.  See copying.txt for details
Error reading file: 1117 The request could not be performed because of an I/O de
vice error
0+0 records in
0+0 records out

D:\>time  0<nl.txt
The current time is: 16:05:49.20 <=== two minutes six seconds
Enter the new time:

=============================

C:\Program Files (x86)\smartmontools\bin>smartctl -l scterc,20,20 hdj
smartctl 5.40 2010-02-10 r3065 [i686-pc-mingw32-win7(64)] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

SCT Error Recovery Control:
          Read:     20 (2.0 seconds)
         Write:     20 (2.0 seconds)


D:\>erc

D:\>time  0<nl.txt
The current time is: 16:06:06.90 <=== 
Enter the new time:

D:\>dd if=\\?\Device\Harddisk9\Partition0 of=temp.bin skip=2231390009 count=1
rawwrite dd for windows version 0.5.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by the GPL.  See copying.txt for details
Error reading file: 1117 The request could not be performed because of an I/O de
vice error
0+0 records in
0+0 records out

D:\>time  0<nl.txt
The current time is: 16:08:21.38 <=== two minutes 15 seconds
Enter the new time:

=============================

C:\Program Files (x86)\smartmontools\bin>smartctl -l scterc,70,70 hdj
smartctl 5.40 2010-02-10 r3065 [i686-pc-mingw32-win7(64)] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

SCT Error Recovery Control:
          Read:     70 (7.0 seconds)
         Write:     70 (7.0 seconds)

D:\>time  0<nl.txt
The current time is: 16:08:44.95 <=== 
Enter the new time:

D:\>dd if=\\?\Device\Harddisk9\Partition0 of=temp.bin skip=2231390009 count=1
rawwrite dd for windows version 0.5.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by the GPL.  See copying.txt for details
Error reading file: 1117 The request could not be performed because of an I/O de
vice error
0+0 records in
0+0 records out

D:\>time  0<nl.txt
The current time is: 16:11:00.16  <=== two minutes 16 seconds!
Enter the new time:

=============================

C:\Program Files (x86)\smartmontools\bin>smartctl -l scterc,0,0 hdj
smartctl 5.40 2010-02-10 r3065 [i686-pc-mingw32-win7(64)] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

SCT Error Recovery Control:
          Read: Disabled
         Write: Disabled


D:\>erc

D:\>time  0<nl.txt
The current time is: 16:11:43.85 <=== 
Enter the new time:

D:\>dd if=\\?\Device\Harddisk9\Partition0 of=temp.bin skip=2231390009 count=1
rawwrite dd for windows version 0.5.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by the GPL.  See copying.txt for details
1+0 records in
1+0 records out

D:\>time  0<nl.txt
The current time is: 16:13:50.48 <=== two minutes 7 seconds!
Enter the new time:

The value cannot be set under Solaris at all (though I'm using a newer build of smartmontools under Solaris) (LSI 1068E), the setting is not retained by the drive. Reading the value under Solaris after it's been set under Windows reports it's disabled, whatever Windows thinks it was set to. It timeouts after 23 seconds (though that's my controller doing it)


# /usr/local/sbin/smartctl -l scterc /dev/rdsk/c7t6d0
smartctl 5.40 2010-07-31 r3131 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

SCT Error Recovery Control:
          Read: Disabled
         Write: Disabled


# time dd if=/dev/rdsk/c7t6d0 of=/dev/null skip=2231390009 count=1
dd: reading `/dev/rdsk/c7t6d0': I/O error
0+0 records in
0+0 records out
0 bytes (0  copied, 23.1104 s, 0.0 kB/s

real    0m23.129s
user    0m0.001s
sys     0m0.003s

# /usr/local/sbin/smartctl -l scterc,70,70 /dev/rdsk/c7t6d0
smartctl 5.40 2010-07-31 r3131 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

SCT Error Recovery Control:
          Read: Disabled
         Write: Disabled

# time dd if=/dev/rdsk/c7t6d0 of=/dev/null skip=2231390009 count=1
dd: reading `/dev/rdsk/c7t6d0': I/O error
0+0 records in
0+0 records out
0 bytes (0  copied, 23.0523 s, 0.0 kB/s

real    0m23.071s
user    0m0.001s
sys     0m0.003s

Once again, on 100% of the drives I've tested (which is only two) SCT error recovery control does not work even though the drive reports it supports it and is enabled successfully. Maybe I just have bad luck with my controllers...

Course, the limitation of my testing is that I'm accessing the drives from user-space, and the measured time is the time it takes for the error to be delivered to the application. The drives aren't actually in RAID, and examination of Solaris' system error logs shows some commands being aborted after 5 seconds but the error not returned until 21 seconds later, though this is consistent whether ERC is set to 7 or 30 seconds. This is probably because I have a 5-second timeout set on the controller and the controller aborts the command after 5 seconds, and the driver also retries 5 times before error'ing out.

All in all, it seems ERC doesn't work when accessing drives from user-space. I cannot say for 100% certain the results would be the same at controller level if the drives were actually in RAID, but back to my original point, do not rely on SCT ERC/TLER/CCTL to solve fundamental deficiencies in RAID controller design. And back to my original argument, once again an intelligent controller prevails over ERC. Intelligent controller = timeout after 24 seconds rather than 2 minutes it takes with non-working TLER and a dumb controller, reaffirming my point that, well, TLER is a crude workaround for the lack of intelligent controllers, and shouldn't be necessary, whether it works or not.

Edited by qasdfdsaq

Share this post


Link to post
Share on other sites
Brand      Type            Type2          Size      RPM      Revision      Firmware      Available    Default        Reboot      Powercycle
Hitachi    HDS722020ALA330 Dekstar        2.0TB     7200     -             JKAOA20N      Yes          Disabled       Stay        Lost

Edited by Hinz

Share this post


Link to post
Share on other sites

I've bought 3 Hitachi's 7k2000 to setup a raid5 config with a perc5i controller. They all have the same firmware version JKAOA28A.

Connected the harddisk to an onboard sata port and tried to enable ERC, default it was disabled.

After a reboot ERC was still enabled, only when powering down the machine the setting was lost.

Brand      Type            Type2          Size      RPM      Revision      Firmware      Available    Default        Reboot      Powercycle
Hitachi    HDS722020ALA330 Deskstar       2.0TB     7200     -             JKAOA28A      Yes          Disabled       Stay        Lost

I've read a lot of topics on the internet about TLER/CCTL etc, the reason that i've bought Hitachi's was because of the good compatibility with raid configs.

But now i'm a bit confused what to do. As i'm running Windows i'm unable to change this setting when connected to the raid controller. I'm wondering why this is only possible in Linux? Is it because of a driver issue? and will it be possible in the future?

Does anyone have a guide to change this settings automatically when powering on the machine and running windows?

Share this post


Link to post
Share on other sites

Since nobody has yet shown that TLER actually works (on any drive where it's disabled by default) and it's pointless and unnecessary on a good controller, I would stop worrying about it and get on with your life.

Share this post


Link to post
Share on other sites

Since nobody has yet shown that TLER actually works (on any drive where it's disabled by default) and it's pointless and unnecessary on a good controller, I would stop worrying about it and get on with your life.

What is deemed as a good controller?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now