digitalfrost

Is this drive dying?

Recommended Posts

I bought two Samsung HD203WI disks recently. I ran a complete bad blocks test before I began using them. Today I found this in dmesg output:

sd 4:0:0:0: [sde] Unhandled error code
sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00
sd 4:0:0:0: [sde] CDB: cdb[0]=0x2a: 2a 00 da 24 97 d8 00 02 a8 00
end_request: I/O error, dev sde, sector 3659831256

This repeats over and over again.

I tried using smartmontools on the disk, however:

2 root@dude ~ # smartctl -a /dev/sde
smartctl 5.39.1 2010-01-28 r3054 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Device:
�p�g8a ��܅Oa Version: Q�f�
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

I'm using this disk in a RAID1 array, and so far there's no sign of a degraded array. Can anyone comment on the state of the disk? Is this a bad drive?

Share this post


Link to post
Share on other sites

I don't know about the failed smartctl. Does it work on the other drive? So far it just looks like one bad sector. If /dev/sde is part of an md array, you could remove it, try to force reallocation (dd if=/dev/zero of=/dev/sde seek=3659831256 bs=512 count=1 oflag=direct), try to read from that sector and/or get a short self-test running on the drive, then readd and let md rebuild if all looks well.

http://smartmontools.sourceforge.net/badblockhowto.html

Share this post


Link to post
Share on other sites

I just encountered a similar problem on my desktop at work:

[3977644.589261] sd 4:0:0:0: [sdc] Unhandled error code
[3977644.589268] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[3977644.589273] sd 4:0:0:0: [sdc] CDB: Write(10): 2a 00 01 77 fb 9f 00 00 08 00
[3977644.589283] end_request: I/O error, dev sdc, sector 24640415
[3977644.589294] raid1: Disk failure on sdc1, disabling device.
[3977644.589296] raid1: Operation continuing on 1 devices.
[3977644.731095] raid1: sda1: redirecting sector 24640352 to another mirror

and smartctl shows:

bilbo:~# smartctl -a /dev/sdc
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Device: /4:0:0:0  Version: 
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

fdisk -l shows nothing and mdadm kicked the drive out of all of my arrays except for swap, probably because swap wasn't touched once the device was gone. I'm not sure if the drive is completely dead or if it was gone for too long trying to repair the bad sector(s) and just fell off the system. It's an old PATA drive and controller, so no AHCI. Linux may not see it again until I reboot.

Share this post


Link to post
Share on other sites

I looked up hostbyte 0x04 (DID_BAD_TARGET) from the error message. If I've understood correctly, it means that the target device was not found. Most likely the libata driver disabled the device after failing to communicate with it, ie. the drive became unresponsive. There should be more clues in the log (look for ata messages, soft/hard resets). That's probably also why smartctl failed.

My advice is to check the cables and connections and run a diagnostic with the manufacturer's diagnostic utility.

http://www.samsung.com/global/business/hdd/support/downloads/support_in_es.html

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now