|
Velociraptor premature failure rate (bad drives, premature to market?)
I have RMA'd several times so far across 12 disks.

- Member
-
Group:
Member
-
Posts:
472
-
Joined:
15-January 06
Posted 25 November 2008 - 07:18 AM
datestardi, on Nov 25 2008, 08:16 AM, said:
Jpiszcz, something doesn't seem right with the data you're posting, as if something other than the drives in your system (or the test methodology) is causing the errors.
Can you move the drives to a known working system and retest them?
If you test them with Data Lifeguard (you can use a DOS CD if you're not running Windows), what is the result?
How are the errors manifesting themselves, other than in your diagnostic tests? (If you're not seeing real-world data errors, that would say a lot I think.)
The customer reviews for the Velociraptor at Newegg are outstanding. I've actually never seen higher.
They are real world errors, the system is not the issue, recall, the system is the same one used with raptor 150s with no problems for 2-3 years FYI. Soon I will have a 3ware RAID card and I cannot wait to see how it handles these drives, if they are crap on the raid card as well I will go with new disks, its more or less a last-ditch effort.
Newegg? Sort by lowest rating and see how many bad drives/sectors/etc there are, quite a few.
Justin.
-
Group:
Patron
-
Posts:
313
-
Joined:
03-January 02
Posted 25 November 2008 - 07:20 AM
Quote They are real world errors, the system is not the issue,
Can you try Data Lifeguard?
http://support.wdc.c...s...612&lang=en
This post has been edited by datestardi: 25 November 2008 - 07:21 AM

- Member
-
Group:
Member
-
Posts:
472
-
Joined:
15-January 06
Posted 25 November 2008 - 08:06 AM
datestardi, on Nov 25 2008, 08:20 AM, said:
If I have time I can try to give this a spin before I replace the drive.
Justin.

- Member
-
Group:
Member
-
Posts:
492
-
Joined:
23-April 03
Posted 25 November 2008 - 08:19 AM
jpiszcz, on Nov 25 2008, 04:22 AM, said:
No bad sectors according to smart, which is clearly wrong.
How do you know?
You don't seem to have bad sectors but you have other problems.
Probably corrupt firmware/corrupt controller/broken motherboard. Uncorrectables / Index Not Founds / DMA errors, etc... So many of them... clearly something wrong.
If i were you, i would stop wasting time and do as suggested:
Plug this drive on a working system. Drop to DOS or boot from manufacturer's CD and run their diagnostics program there. If you drop to DOS, try running MHDD. You would have a better idea if there are bad sectors or IDNFs/UNCs, etc.

- Member
-
Group:
Member
-
Posts:
472
-
Joined:
15-January 06
Posted 25 November 2008 - 08:23 AM
6_6_6, on Nov 25 2008, 09:19 AM, said:
jpiszcz, on Nov 25 2008, 04:22 AM, said:
No bad sectors according to smart, which is clearly wrong.
How do you know?
You don't seem to have bad sectors but you have other problems.
Probably corrupt firmware/corrupt controller/broken motherboard. Uncorrectables / Index Not Founds / DMA errors, etc... So many of them... clearly something wrong.
If i were you, i would stop wasting time and do as suggested:
Plug this drive on a working system. Drop to DOS or boot from manufacturer's CD and run their diagnostics program there. If you drop to DOS, try running MHDD. You would have a better idea if there are bad sectors or IDNFs/UNCs, etc.
How do I know? On 60-70% of the disks, they started having bad sector errors. On the other 30-40% they report uncorrectable sectors to the OS (when I turned on TLER). All of these disks are being used in either RAID1 or RAID6-based configurations.
It appears when you enable TLER it reports the error to the OS and does not track it in SMART, for the last few disks, TLER had been enabled and the error and reporting was the same.
Without TLER, you see the offline_uncorrectable and pending sectors creep up etc, very interesting!

- Member
-
Group:
Member
-
Posts:
224
-
Joined:
03-November 05
Posted 25 November 2008 - 01:38 PM
continuum, on Nov 10 2008, 03:49 PM, said:
Still if you did it all at the same time then they're all from the same batch, and if they were all from a single vendor they were all subject to the vendor's handling (or lack thereof). I would strongly suspect a mishandled batch, poor handling by the vendor I remember a company I worked for where a bunch of systems shipped closely to each other experienced a high drive failure rate. Internally they said it was due to mass-in-air-de-pallet-ization, a word they coined to indicate that there was trouble on the plane used to ship the pallet of drives, causing the pallets to fall apart. The idea being that the sudden hard jolts to all of the drives caused a high failure rate.
Officially, the company had no drive problem.

- Mod
-
Group:
Mod
-
Posts:
2,452
-
Joined:
31-December 01
Posted 25 November 2008 - 09:06 PM
We've had similar issues with shipping problems, poor packaging, or both...
Quote does not track it in SMART SMART only predicts about 50% of drive failures if memory serves...
sorry, don't have anything else worthwhile to add at the moment. Good luck! Definitely try a few in a different system at least, when you go scream at WD you'll have more evidence to throw at 'em.

- Member
-
Group:
Member
-
Posts:
492
-
Joined:
23-April 03
Posted 26 November 2008 - 12:50 AM
jpiszcz, on Nov 25 2008, 08:23 AM, said:
6_6_6, on Nov 25 2008, 09:19 AM, said:
jpiszcz, on Nov 25 2008, 04:22 AM, said:
No bad sectors according to smart, which is clearly wrong.
How do you know?
You don't seem to have bad sectors but you have other problems.
Probably corrupt firmware/corrupt controller/broken motherboard. Uncorrectables / Index Not Founds / DMA errors, etc... So many of them... clearly something wrong.
If i were you, i would stop wasting time and do as suggested:
Plug this drive on a working system. Drop to DOS or boot from manufacturer's CD and run their diagnostics program there. If you drop to DOS, try running MHDD. You would have a better idea if there are bad sectors or IDNFs/UNCs, etc.
How do I know? On 60-70% of the disks, they started having bad sector errors. On the other 30-40% they report uncorrectable sectors to the OS (when I turned on TLER). All of these disks are being used in either RAID1 or RAID6-based configurations.
It appears when you enable TLER it reports the error to the OS and does not track it in SMART, for the last few disks, TLER had been enabled and the error and reporting was the same.
Without TLER, you see the offline_uncorrectable and pending sectors creep up etc, very interesting!
Where is it? I don't see pending sector count anywhere. They all show 0 above. I have no idea what TLER is. But if you are having so many drives failing, how come you never ran the manufacturer's utility and MHDD on these drives in a different stable system and how come you are forwarding us everything from the OS? This way, you can at least eliminate all the other unknowns in the equation and narrow it down to the drive. You have lots of IDNFs. I doubt you have something wrong with the media itself... probably firmware corruption, or motherboard/controller/OS/carbon footprint... well... sorry...
Proper troubleshooting for me:
1. Move drive to a different working stable system.
2. Run manufacturer's utily from CD/DOS. Record SMART values.
3. Do a Short DST... Long DST... Zero-fill / Full erase.
4. Run manufaturer's utility again. Check SMART values.
5. Run MHDD from DOS and see how drive surface is sector by sector.

- Member
-
Group:
Member
-
Posts:
472
-
Joined:
15-January 06
Posted 26 November 2008 - 04:17 AM
For this particular disk, it seems to be OK in another system BUT I did zero the disk out before I tested it in another system, next time I will not zero it out.. In addition, the system is old (ICH5), does not support NCQ or AHCI; however:
WD tools -> Check, Short+Extended = OK
Spinite -> 10 hour test = OK
Currently, I am testing the next drive with its mirrored disk (raid1) and running lots of I/O bound process around the RAID1.
1. I have disabled all SMART-related testing on the disk.
2. I have disabled hddtemp testing on the disk.
I will note one thing that was interesting, no matter what system the disk was connected to, it kept continually grinding away, even when IDLE or in the bios, almost like it was stuck on some kind of internal offline test, even though, I had disabled all relevant offline tests on the disk etc..
I want to find the root cause of this problem as much as everyone else, for this specific disk, thus far, seems to be a false positive in terms of tests, but it still acts quite weird (regarding constant accesses etc) so with the new one I continue to run disk benchmarks etc (but no smart/hddtemp) stuff for awhile and I will see if the problem recurs.
Justin.
-
Group:
Patron
-
Posts:
313
-
Joined:
03-January 02
Posted 26 November 2008 - 06:48 AM
Quote Spinite -> 10 hour test = OK
Spinite? Or Spinrite?
Jpiszcz, I wouldn't let Spinrite touch my drives, except as a last resort data recovery before the trash bin - it can do a lot of things that you can't imagine. (I'm not saying Spinrite is bad - I'm saying it can do things to the drive that drive manufacturers don't expect, and so cause drives to behave in ways users might not expect.)
I hope you haven't been using it from the beginning... unfortunately, if you have, that could explain a lot things.
From Wikipedia:
"SpinRite is declared by its developers to have certain unique features[3], such as disabling of disk write caching, disabling of [sector] auto-relocation.... Another important feature is direct hardware-level access, whereby the drive's internal controller interacts directly with the program, rather than through the operating system. This, in turn, allows dynamic head repositioning, whereby, when reading a faulty sector, the reading head is deliberately moved backwards and forwards many times, by varying amounts, in the hope that each time it returns to the sector, it may come to rest in a slightly different position....
It should be noted that certain claims made by SpinRite's makers have proved controversial. The program's claimed ability to "refresh" ageing drives has met with particular scepticism, while its "recovery" of sectors marked as damaged by the file system controller is considered by some to be undesirable and ultimately counter-productive...."
http://en.wikipedia.org/wiki/SpinRite
1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users
|
|