jpiszcz

Velociraptor premature failure rate (bad drives, premature to market?)

Recommended Posts

As far as I know the only way to flash these drives is to use the DOS boot disk, also note that about only half of my drives flashed, in addition someone else ran the flash and it hung up. I really think the Velociraptors simply do not work correctly in a parity-raid configuration.

It looks like we are going to dump the drives we have, send the new ones back as switch to a different Mfg'r and different drive, which will probably be a 7200RPM (there are no other 10,000RPM/SATA/2.5" drives out there)

Do you know if a single drive, no RAID will have an issue with Linux?

Share this post


Link to post
Share on other sites

It looks like we are going to dump the drives we have, send the new ones back as switch to a different Mfg'r and different drive, which will probably be a 7200RPM (there are no other 10,000RPM/SATA/2.5" drives out there)

Do you know if a single drive, no RAID will have an issue with Linux?

Hi,

That is what I am currently running but not on a 24/7 host, it did run ok for a few days, did not have any problems. I am continuing to use it in single-disk configuration, will update this thread if I see any trouble with it.

Justin.

Share this post


Link to post
Share on other sites

As far as I know the only way to flash these drives is to use the DOS boot disk, also note that about only half of my drives flashed, in addition someone else ran the flash and it hung up. I really think the Velociraptors simply do not work correctly in a parity-raid configuration.

Justin,

Do you know if this same issue shows up in a single drive, non-RAID config running under Linux?

- Pat

Share this post


Link to post
Share on other sites

Justin,

Do you know if this same issue shows up in a single drive, non-RAID config running under Linux?

- Pat

Assuming second message was a mistake? So far no issues running with just 1 disk.

Share this post


Link to post
Share on other sites

Well,

I had been using a single Velociraptor hard drive for my Linux system.

It worked well for awhile (never kept it on for more than a few hours).

Recently, I left the computer on for more than 24 hours.

# screen -ls

-bash: /usr/bin/screen: Input/output error

# ls

-bash: ls: command not found

# dmesg

-bash: dmesg: command not found

# ls

-bash: ls: command not found

# /bin/ls

-bash: /bin/ls: Input/output error

# echo "This is what happens when you use a Velociraptor hard drive."

This is what happens when you use a Velociraptor hard drive.

#

In addition, when I reboot, the system can no longer boot, e.g.

UNABLE TO BOOT FROM HARD DISK

Fun stuff, I don't have time now but I will look at this later to try and figure out what happened.

Justin.

Share this post


Link to post
Share on other sites

Another horror story?

It looks I just don't have luck with WD Raptors, model WDC WD3000BLFS-0.

I am running iSCSI storage server (Ubuntu 10.04) with 4 of them in Linux RAID10 on LSI on board controller and 4 of them in HW LSI raid controller. I had already upgraded firmware because of the problem with LSI raid controller described here

Then there is this problem https://bugzilla.redhat.com/show_bug.cgi?id=512613 which is alive on 10.04 (althow I did't have the guts to try it for a year now).

Now after more than 200 days uptime, last night software raid failed on me.

<Log>

Jun 6 20:24:54 ozssan kernel: [2279698.613730] mptbase: ioc0: LogInfo(0x31110b00): Originator={PL}, Code={Reset}, SubCode(0x0b00)

Jun 6 20:24:54 ozssan kernel: [2279698.613918] mptbase: ioc0: LogInfo(0x31110b00): Originator={PL}, Code={Reset}, SubCode(0x0b00)

Jun 6 20:24:54 ozssan kernel: [2279698.628061] mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)

Jun 6 20:24:54 ozssan kernel: [2279698.628069] sd 8:0:6:0: [sdd] Unhandled error code

Jun 6 20:24:54 ozssan kernel: [2279698.628072] sd 8:0:6:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Jun 6 20:24:54 ozssan kernel: [2279698.628076] sd 8:0:6:0: [sdd] CDB: Read(10): 28 00 18 ad 55 bf 00 00 08 00

Jun 6 20:24:54 ozssan kernel: [2279698.628085] end_request: I/O error, dev sdd, sector 414012863

Jun 6 20:24:54 ozssan kernel: [2279698.636488] mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)

Jun 6 20:24:54 ozssan kernel: [2279698.636493] mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)

Jun 6 20:24:54 ozssan kernel: [2279698.636498] mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)

Jun 6 20:24:54 ozssan kernel: [2279698.636502] mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)

Jun 6 20:24:54 ozssan kernel: [2279698.636507] mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)

Jun 6 20:24:54 ozssan kernel: [2279698.636511] mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)

Jun 6 20:24:54 ozssan kernel: [2279698.636516] mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)

Jun 6 20:24:54 ozssan kernel: [2279698.636520] mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)

Jun 6 20:24:54 ozssan kernel: [2279698.636525] mptbase: ioc0: LogInfo(0x30050000): Originator={IOP}, Code={Task Terminated}, SubCode(0x0000)

Jun 6 20:24:54 ozssan kernel: [2279698.636532] raid10: sdd1: rescheduling sector 828025600

Jun 6 20:24:54 ozssan kernel: [2279698.644896] sd 8:0:6:0: [sdd] Unhandled error code

Jun 6 20:24:54 ozssan kernel: [2279698.644898] sd 8:0:6:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Jun 6 20:24:54 ozssan kernel: [2279698.644902] sd 8:0:6:0: [sdd] CDB: Read(10): 28 00 1c 9b 48 0f 00 00 30 00

Jun 6 20:24:54 ozssan kernel: [2279698.644910] end_request: I/O error, dev sdd, sector 479938575

Jun 6 20:24:54 ozssan kernel: [2279698.653200] raid10: sdd1: rescheduling sector 959876944

Jun 6 20:24:54 ozssan kernel: [2279698.661417] raid10: sdd1: rescheduling sector 959876952

Jun 6 20:24:54 ozssan kernel: [2279698.669461] raid10: sdd1: rescheduling sector 959876960

Jun 6 20:24:54 ozssan kernel: [2279698.677347] raid10: sdd1: rescheduling sector 959876968

Jun 6 20:24:54 ozssan kernel: [2279698.685027] raid10: sdd1: rescheduling sector 959876976

Jun 6 20:24:54 ozssan kernel: [2279698.692536] raid10: sdd1: rescheduling sector 959876984

Jun 6 20:24:54 ozssan kernel: [2279698.692540] sd 8:0:6:0: [sdd] Unhandled error code

Jun 6 20:24:54 ozssan kernel: [2279698.692542] sd 8:0:6:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Jun 6 20:24:54 ozssan kernel: [2279698.692545] sd 8:0:6:0: [sdd] CDB: Write(10): 2a 00 00 29 7b 37 00 00 08 00

Jun 6 20:24:54 ozssan kernel: [2279698.692550] end_request: I/O error, dev sdd, sector 2718519

Jun 6 20:24:54 ozssan kernel: [2279698.692553] raid10: Disk failure on sdd1, disabling device.

Jun 6 20:24:54 ozssan kernel: [2279698.692554] raid10: Operation continuing on 3 devices.

</log>

And after 8 seconds second disk failed and after another 20 seconds third disk failed too. Same log for all of drives. Pointless to say raid array was down.

Interesting point is that 4'th drive didn't fail. But this drive has been replaced two months ago and has been powered up for about 60 days.

Is there another ticking bomb in WD drives?

by

TheR

Share this post


Link to post
Share on other sites

I've got a fair number of them deployed or queued for deployment right now, none have been online for more than 60 days currently. I sure hope there aren't any more bugs like that. On the plus side, I'm running Windows with automatic updates so the machines will *never* be up for 200 days at a time! :-p

Share this post


Link to post
Share on other sites

I have completely given up using these drives in any kind of RAID setup.

They seem wholy unreliable. Just today one more of these crappy drives died during a RAID 10 rebuild. Forcing me on a 6 hour rescue mission to restore databases and whatnot from our dev server

As a single drive they seem to work ok. but i have had nothing but trouble when using them with RAID.

Share this post


Link to post
Share on other sites

I have completely given up using these drives in any kind of RAID setup.

They seem wholy unreliable. Just today one more of these crappy drives died during a RAID 10 rebuild. Forcing me on a 6 hour rescue mission to restore databases and whatnot from our dev server

As a single drive they seem to work ok. but i have had nothing but trouble when using them with RAID.

Looks like a pretty old thread, aged from when these drives were just released. Even if there were some reliability issues with them (that I personally didn't have, with having 2 such drives from their release date till now), there shouldn't be any such issue now

Share this post


Link to post
Share on other sites

I should mention they are operating in 20-disk RAID5 groups. . .

I am shocked that you are not having issues. Are you using software or hardware RAID 5? If software, what OS? If hardware, what RAID controller are you using?

Just curious. Thanks!

Frank

Share this post


Link to post
Share on other sites

Hardware. Adaptec 6805. In a Supermicro chassis using their SAS2-216EL1 backplane. The drives negotiate at 3Gbps instead of 6Gbps for some reason, which has always irritated me but perhaps they work better that way?

I'm probably moving to Toshiba 10K SAS drives now though since they are performing better in the same setup.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now