Storage Forums: Velociraptor premature failure rate (bad drives, premature to market?) - Storage Forums

Jump to content

  • (11 Pages)
  • +
  • 1
  • 2
  • 3
  • Last »
  • You cannot start a new topic
  • You cannot reply to this topic

Velociraptor premature failure rate (bad drives, premature to market?) I have RMA'd several times so far across 12 disks.

#1 User is online   jpiszcz Icon

  • Member
  • Group: Member
  • Posts: 464
  • Joined: 15-January 06

Posted 09 November 2008 - 05:11 PM

The VR300GB reviews on NewEgg:
http://www.newegg.co...N82E16822136260

Sort by lowest rating, I find this very accurate, out of 12 drives I have purchased all at the same time (bad idea), I have had numerous failures, in almost every instance its bad sectors. I have a similar configuration but with Raptor150s and it has worked fine with only 1 or 2 failures in the past 2-4 years.

For the Velociraptor 300s, I have had a total of:

5 failures from the original disks
1 failure from the RMA I received from WD
6 total failures out of 12 drives

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed: read failure	   10%	  3059		 586068047
# 2  Short offline	   Completed: read failure	   60%	  3058		 586068047
# 3  Conveyance offline  Completed without error	   00%	  3053		 -
# 4  Short offline	   Completed without error	   00%	  3053		 -
# 5  Extended offline	Completed without error	   00%	  3053		 -
# 6  Selective offline   Completed without error	   00%	  3052		 -
# 7  Extended offline	Completed: read failure	   10%	  3049		 586068047
# 8  Short offline	   Completed without error	   00%	  3024		 -
# 9  Short offline	   Completed without error	   00%	  3001		 -


Another one:
[140278.271138] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[140278.271148] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[140278.271149]		  res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[140278.271154] ata3.00: status: { DRDY }
[140278.271160] ata3: hard resetting link
[140278.576071] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[140278.601864] ata3.00: configured for UDMA/133
[140278.601871] end_request: I/O error, dev sdc, sector 586067067
														^^^^^^^^^
												  Bad sector above.

[140278.601876] md: super_written gets error=-5, uptodate=0
[140278.601880] raid1: Disk failure on sdc3, disabling device.
[140278.601881] raid1: Operation continuing on 1 devices.
[140278.601908] ata3: EH complete
[140278.612277] sd 2:0:0:0: [sdc] 586072368 512-byte hardware sectors (300069 MB)

A week later:

	1 mdadm monitor  6:00pm	 (1K) Fail event on /dev/md2

[ .. ]

[35216.712178] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[35216.722031] ata3.00: configured for UDMA/133
[35216.722040] end_request: I/O error, dev sdc, sector 310170509
													   ^^^^^^^^^
												  Bad sector above.
[35216.722048] raid1: Disk failure on sdc3, disabling device.
[35216.722049] raid1: Operation continuing on 1 devices.
[35216.722072] ata3: EH complete
[35216.722509] sd 2:0:0:0: [sdc] 586072368 512-byte hardware sectors (300069 MB)
[35216.729432] md: recovery of RAID array md2

A day after that:

Fail event on /dev/md2:p34.internal.lan


Oct 26 09:00:33 p34 kernel: [89217.234246] end_request: I/O error, dev sdc, sector 309961068
   ^^^^^^^^^
   Bad sector above.
Oct 26 09:00:33 p34 kernel: [89217.234259] raid1: Disk failure on sdc3, disabling device.
Oct 26 09:00:33 p34 kernel: [89217.234260] raid1: Operation continuing on 1 devices.
Oct 26 09:00:33 p34 kernel: [89217.234276] ata3: EH complete
Oct 26 09:00:33 p34 kernel: [89217.244372] sd 2:0:0:0: [sdc] 586072368 512-byte hardware sectors (300069 MB)


I am really getting tired of RMA'ing disks to WD, I have even opened a case with them in June/July and asked them about premature failure rates, they said they did testing and there was no such problem. However, with all of the comments on NewEgg and all my failures in only 5-6 months, I am not sure I would recommend Velociraptors to anyone.

Comments? Does anyone else have a stack of VR300s? What has been your experience with them?

Justin.

#2 User is offline   continuum Icon

  • Mod
  • Group: Mod
  • Posts: 2,395
  • Joined: 31-December 01

Posted 09 November 2008 - 07:40 PM

You bought them from Newegg? OEM I assume? :scared:

Were they all packed properly? Newegg, Mwave, Zipzoomfly, Allstarshop, just about every major retailer out there is notorious for improperly handling and packing OEM packaged products. I would strongly suspect that the way most retailers out there pack things is causing a significant jump in failure rates.


Hint: most retails do a single layer, maybe two layers, three if you're lucky, of the large-bubble bubble-wrap around a drive. The drive as packed then goes into the bottom of the box and topped off with foam peanuts. This is entirely INadequate protection as this results in less than 2" of bubble wrap in each dimension. Plus OEM packed products are not actually wrapped in bubble wrap til they hit the packaging stage, which means they go through the entire warehouse in just the ESD bag/clamshell... eek.


I don't have a huge database of Velociraptors here, but we have a few and haven't noticed anything out of the ordinary-- but that said my sample of Velociraptors is very small by my standards as well as the fact that it's somewhat affected by other peculiarities to the products we design/build.

#3 User is online   jpiszcz Icon

  • Member
  • Group: Member
  • Posts: 464
  • Joined: 15-January 06

Posted 10 November 2008 - 04:05 AM

View Postcontinuum, on Nov 9 2008, 08:40 PM, said:

You bought them from Newegg? OEM I assume? :scared:

Were they all packed properly? Newegg, Mwave, Zipzoomfly, Allstarshop, just about every major retailer out there is notorious for improperly handling and packing OEM packaged products. I would strongly suspect that the way most retailers out there pack things is causing a significant jump in failure rates.


OEM yes and when I ordered I made 12 separate orders, each for 1 drive each, so each drive was shipped in its own box and no drive was DOA (at least to begin with).

Justin.

#4 User is offline   continuum Icon

  • Mod
  • Group: Mod
  • Posts: 2,395
  • Joined: 31-December 01

Posted 10 November 2008 - 04:49 PM

Still if you did it all at the same time then they're all from the same batch, and if they were all from a single vendor they were all subject to the vendor's handling (or lack thereof). I would strongly suspect a mishandled batch, poor handling by the vendor (yes, 12 separate drives being tossed individually into the box before packing is just as likely as 12 drives in a single order each being tossed separately), etc.

Mishandling does not always cause DOA. And each drive being shipped individually means each drive can still be improperly packed...

#5 User is offline   troubleshooter Icon

  • Member
  • Group: Member
  • Posts: 2
  • Joined: 24-November 08

Posted 24 November 2008 - 10:38 AM

I'm curious; How many hours of runtime (Power_On_Hours) as reported by SMART (smartctl -a) were on the drives when they failed?

#6 User is online   jpiszcz Icon

  • Member
  • Group: Member
  • Posts: 464
  • Joined: 15-January 06

Posted 24 November 2008 - 10:44 AM

View Posttroubleshooter, on Nov 24 2008, 11:38 AM, said:

I'm curious; How many hours of runtime (Power_On_Hours) as reported by SMART (smartctl -a) were on the drives when they failed?


The most recent one below, 899 hours, its cute too, notice how there are no bad sectors in the smart statistics, but scroll all the way down and you will see what happens if you try to use the disk, the filesystem becomes corrupted and a ton of disk errors.

Well this is the 9th drive that failed last Friday, I submitted an RMA for this and I am also ordering a 3ware RAID controller to better deal with these problems.

=== START OF INFORMATION SECTION ===
Device Model:	 WDC WD3000HLFS-01G6U0
Serial Number:	[snip]
Firmware Version: 04.04V01
User Capacity:	300,069,052,416 bytes
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:	Mon Nov 24 10:40:44 2008 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
										was completed without error.
										Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever 
										been run.
Total time to complete Offline 
data collection:				 (4800) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine 
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		(  59) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x303f) SCT Status supported.
										SCT Feature Control supported.
										SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   200   199   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0003   198   198   021	Pre-fail  Always	   -	   3083
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   22
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   099   099   000	Old_age   Always	   -	   899
 10 Spin_Retry_Count		0x0012   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0012   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   22
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   13
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   22
194 Temperature_Celsius	 0x0022   122   115   000	Old_age   Always	   -	   25
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0012   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0010   200   200   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
ATA Error Count: 268 (device log contains only the most recent five errors)
		CR = Command Register [HEX]
		FR = Features Register [HEX]
		SC = Sector Count Register [HEX]
		SN = Sector Number Register [HEX]
		CL = Cylinder Low Register [HEX]
		CH = Cylinder High Register [HEX]
		DH = Device/Head Register [HEX]
		DC = Device Command Register [HEX]
		ER = Error register [HEX]
		ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 268 occurred at disk power-on lifetime: 889 hours (37 days + 1 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 34 cf f3 a3
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ea 00 00 00 00 00 00 08  11d+07:49:52.019  FLUSH CACHE EXIT
  ca 00 08 30 6d 2f 0e 08  11d+07:49:51.942  WRITE DMA
  35 00 00 30 69 2f 0e 08  11d+07:49:51.940  WRITE DMA EXT
  35 00 00 30 65 2f 0e 08  11d+07:49:51.938  WRITE DMA EXT
  35 00 00 30 61 2f 0e 08  11d+07:49:51.936  WRITE DMA EXT

Error 267 occurred at disk power-on lifetime: 889 hours (37 days + 1 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 00 28 4b 00 e0  Error: IDNF at LBA = 0x00004b28 = 19240

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 00 28 4b 00 0e 08  11d+07:49:26.705  WRITE DMA EXT
  35 00 00 d0 91 fb 0d 08  11d+07:49:26.395  WRITE DMA EXT
  35 00 00 d0 8d fb 0d 08  11d+07:49:26.393  WRITE DMA EXT
  35 00 00 d0 89 fb 0d 08  11d+07:49:26.391  WRITE DMA EXT
  35 00 00 d0 85 fb 0d 08  11d+07:49:26.389  WRITE DMA EXT

Error 266 occurred at disk power-on lifetime: 889 hours (37 days + 1 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 00 98 17 dc e0  Error: IDNF at LBA = 0x00dc1798 = 14423960

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 00 98 17 dc 0d 08  11d+07:49:05.358  WRITE DMA EXT
  ca 00 10 c0 0e 00 00 08  11d+07:49:05.309  WRITE DMA
  35 00 00 f0 d2 d6 0d 08  11d+07:49:05.303  WRITE DMA EXT
  35 00 00 f0 ce d6 0d 08  11d+07:49:05.301  WRITE DMA EXT
  35 00 00 f0 ca d6 0d 08  11d+07:49:05.299  WRITE DMA EXT

Error 265 occurred at disk power-on lifetime: 866 hours (36 days + 2 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 01 ba 48 40  Error: UNC at LBA = 0x0048ba01 = 4766209

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  20 00 01 01 ba 48 00 08  10d+08:31:15.085  READ SECTOR(S)
  20 00 01 00 ba 48 00 08  10d+08:31:08.857  READ SECTOR(S)
  27 00 00 00 00 00 00 08  10d+08:31:08.845  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 00 08  10d+08:31:08.838  IDENTIFY DEVICE
  ef 03 46 00 00 00 00 08  10d+08:31:08.838  SET FEATURES [Set transfer mode]

Error 264 occurred at disk power-on lifetime: 866 hours (36 days + 2 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 ff b9 48 40  Error: UNC at LBA = 0x0048b9ff = 4766207

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  20 00 01 ff b9 48 00 08  10d+08:31:01.825  READ SECTOR(S)
  27 00 00 00 00 00 00 08  10d+08:31:01.813  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 00 08  10d+08:31:01.807  IDENTIFY DEVICE
  ef 03 46 00 00 00 00 08  10d+08:31:01.807  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 00 08  10d+08:31:01.798  READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	   899		 -
# 2  Extended offline	Completed without error	   00%	   893		 -
# 3  Short offline	   Completed without error	   00%	   892		 -
# 4  Short offline	   Interrupted (host reset)	  90%	   889		 -
# 5  Extended offline	Completed without error	   00%	   871		 -
# 6  Short offline	   Completed without error	   00%	   870		 -
# 7  Extended offline	Interrupted (host reset)	  30%	   866		 -
# 8  Extended offline	Completed without error	   00%	   858		 -
# 9  Short offline	   Completed without error	   00%	   857		 -
#10  Short offline	   Completed without error	   00%	   842		 -
#11  Extended offline	Completed without error	   00%	   823		 -
#12  Short offline	   Completed without error	   00%	   822		 -
#13  Short offline	   Completed without error	   00%	   822		 -
#14  Short offline	   Completed without error	   00%	   818		 -
#15  Short offline	   Completed without error	   00%	   794		 -
#16  Short offline	   Completed without error	   00%	   771		 -
#17  Short offline	   Completed without error	   00%	   747		 -
#18  Short offline	   Completed without error	   00%	   723		 -
#19  Extended offline	Completed without error	   00%	   701		 -
#20  Short offline	   Completed without error	   00%	   676		 -
#21  Short offline	   Completed without error	   00%	   652		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

p34:~#


Nov 24 01:04:11 p34 kernel: [749803.050746] ata1.00: configured for UDMA/133
Nov 24 01:04:11 p34 kernel: [749803.050758] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Nov 24 01:04:11 p34 kernel: [749803.050761] sd 0:0:0:0: [sda] Sense Key : Aborted Command [current] [descriptor]
Nov 24 01:04:11 p34 kernel: [749803.050765] Descriptor sense data with sense descriptors (in hex):
Nov 24 01:04:11 p34 kernel: [749803.050767]		 72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
Nov 24 01:04:11 p34 kernel: [749803.050774]		 0d dc 17 98 
Nov 24 01:04:11 p34 kernel: [749803.050776] sd 0:0:0:0: [sda] Add. Sense: Recorded entity not found
Nov 24 01:04:11 p34 kernel: [749803.050791] ata1: EH complete
Nov 24 01:04:11 p34 kernel: [749803.052931] sd 0:0:0:0: [sda] 586072368 512-byte hardware sectors (300069 MB)
Nov 24 01:04:11 p34 kernel: [749803.055061] sd 0:0:0:0: [sda] Write Protect is off
Nov 24 01:04:11 p34 kernel: [749803.059299] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Nov 24 01:04:32 p34 kernel: [749824.418104] ata1.00: configured for UDMA/133
Nov 24 01:04:32 p34 kernel: [749824.418115] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Nov 24 01:04:32 p34 kernel: [749824.418119] sd 0:0:0:0: [sda] Sense Key : Aborted Command [current] [descriptor]
Nov 24 01:04:32 p34 kernel: [749824.418123] Descriptor sense data with sense descriptors (in hex):
Nov 24 01:04:32 p34 kernel: [749824.418124]		 72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
Nov 24 01:04:32 p34 kernel: [749824.418131]		 0e 00 4b 28 
Nov 24 01:04:32 p34 kernel: [749824.418138] sd 0:0:0:0: [sda] Add. Sense: Recorded entity not found
Nov 24 01:04:32 p34 kernel: [749824.418151] ata1: EH complete
Nov 24 01:04:32 p34 kernel: [749824.420284] sd 0:0:0:0: [sda] 586072368 512-byte hardware sectors (300069 MB)
Nov 24 01:04:32 p34 kernel: [749824.422418] sd 0:0:0:0: [sda] Write Protect is off
Nov 24 01:04:32 p34 kernel: [749824.426658] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Nov 24 01:04:58 p34 kernel: [749849.757166] ata1.00: configured for UDMA/133
Nov 24 01:04:58 p34 kernel: [749849.757175] ata1: EH complete
Nov 24 01:04:58 p34 kernel: [749849.757351] sd 0:0:0:0: [sda] 586072368 512-byte hardware sectors (300069 MB)
Nov 24 01:04:58 p34 kernel: [749849.767482] xfs_force_shutdown(sda,0x2) called from line 1056 of file fs/xfs/xfs_log.c.  Return address = 0xffffffff803b67f3
Nov 24 01:04:58 p34 kernel: [749849.767503] sd 0:0:0:0: [sda] Write Protect is off
Nov 24 01:04:58 p34 kernel: [749849.771767] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Nov 24 01:05:03 p34 kernel: [749854.805736] Filesystem "sda": xfs_log_force: error 5 returned.
Nov 24 01:05:07 p34 kernel: [749859.583192] ata1: hard resetting link
Nov 24 01:05:08 p34 kernel: [749859.888177] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 24 01:05:08 p34 kernel: [749859.910650] ata1.00: configured for UDMA/133
Nov 24 01:05:08 p34 kernel: [749859.910659] ata1: EH complete
Nov 24 01:05:08 p34 kernel: [749859.910733] sd 0:0:0:0: [sda] 586072368 512-byte hardware sectors (300069 MB)
Nov 24 01:05:08 p34 kernel: [749859.910777] sd 0:0:0:0: [sda] Write Protect is off
Nov 24 01:05:08 p34 kernel: [749859.910850] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA


#7 User is offline   troubleshooter Icon

  • Member
  • Group: Member
  • Posts: 2
  • Joined: 24-November 08

Posted 24 November 2008 - 03:34 PM

Might you still have a record of the Power_On_Hours from the previous eight failed drives?

#8 User is online   jpiszcz Icon

  • Member
  • Group: Member
  • Posts: 464
  • Joined: 15-January 06

Posted 24 November 2008 - 03:46 PM

View Posttroubleshooter, on Nov 24 2008, 04:34 PM, said:

Might you still have a record of the Power_On_Hours from the previous eight failed drives?


Checking..

20080618-raptor300
20080715-raptor300
20080925-raptor300
20081018-raptor300
20081029-raptor300
20081111-raptor300
20081121-raptor300
sdd
sdl

Of which, I have the following info (dont have it for all of them):

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 50% 270 586070865


SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 10% 3049 586068047

(but began erroring shortly after, even though it says OK)
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 3720 -

(but began erroring shortly after, even though it says OK)
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 818 -


SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Selective offline Interrupted (host reset) 90% 2312 -

#9 User is online   jpiszcz Icon

  • Member
  • Group: Member
  • Posts: 464
  • Joined: 15-January 06

Posted 25 November 2008 - 04:22 AM

Here is what it looks like if you try to keep using the drive:

[843251.690450] sd 0:0:0:0: [sda] 586072368 512-byte hardware sectors (300069 MB)
[843251.690609] sd 0:0:0:0: [sda] Write Protect is off
[843251.690611] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[843251.691055] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[843262.667482] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[843262.667488] ata1.00: irq_stat 0x40000001
[843262.667494] ata1.00: cmd c8/00:08:00:00:64/00:00:00:00:00/ef tag 0 dma 4096 in
[843262.667495]		  res 51/40:08:00:00:64/00:00:00:64:00/ef Emask 0x9 (media error)
[843262.667500] ata1.00: status: { DRDY ERR }
[843262.667503] ata1.00: error: { UNC }
[843262.688510] ata1.00: configured for UDMA/133
[843262.688519] ata1: EH complete
[843262.695693] sd 0:0:0:0: [sda] 586072368 512-byte hardware sectors (300069 MB)
[843262.696166] sd 0:0:0:0: [sda] Write Protect is off
[843262.696169] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[843262.696767] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[843289.797086] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[843289.797093] ata1.00: irq_stat 0x40000001
[843289.797099] ata1.00: cmd ca/00:08:b0:15:63/00:00:00:00:00/ef tag 0 dma 4096 out
[843289.797101]		  res 51/10:08:b0:15:63/00:00:00:63:00/ef Emask 0x81 (invalid argument)
[843289.797106] ata1.00: status: { DRDY ERR }
[843289.797109] ata1.00: error: { IDNF }
[843289.816289] ata1.00: configured for UDMA/133
[843289.816299] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
[843289.816304] sd 0:0:0:0: [sda] Sense Key : Aborted Command [current] [descriptor]
[843289.816309] Descriptor sense data with sense descriptors (in hex):
[843289.816312]		 72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
[843289.816340]		 0f 63 15 b0 
[843289.816350] sd 0:0:0:0: [sda] Add. Sense: Recorded entity not found
[843289.816358] end_request: I/O error, dev sda, sector 258151856
[843289.816363] Buffer I/O error on device sda, logical block 32268982
[843289.816366] lost page write due to I/O error on sda
[843289.816378] ata1: EH complete
[843289.816566] sd 0:0:0:0: [sda] 586072368 512-byte hardware sectors (300069 MB)
[843289.816735] sd 0:0:0:0: [sda] Write Protect is off
[843289.816740] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[843289.817090] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[843326.116002] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[843326.116008] ata1.00: irq_stat 0x40000001
[843326.116015] ata1.00: cmd 35/00:f0:48:93:70/00:01:11:00:00/e0 tag 0 dma 253952 out
[843326.116016]		  res 51/10:f0:48:93:70/00:01:11:00:00/e0 Emask 0x81 (invalid argument)
[843326.116021] ata1.00: status: { DRDY ERR }
[843326.116024] ata1.00: error: { IDNF }
[843326.136418] ata1.00: configured for UDMA/133
[843326.136435] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
[843326.136439] sd 0:0:0:0: [sda] Sense Key : Aborted Command [current] [descriptor]
[843326.136444] Descriptor sense data with sense descriptors (in hex):
[843326.136447]		 72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
[843326.136457]		 11 70 93 48 
[843326.136461] sd 0:0:0:0: [sda] Add. Sense: Recorded entity not found
[843326.136466] end_request: I/O error, dev sda, sector 292590408
[843326.136509] ata1: EH complete
[843326.136687] sd 0:0:0:0: [sda] 586072368 512-byte hardware sectors (300069 MB)
[843326.136856] sd 0:0:0:0: [sda] Write Protect is off
[843326.136860] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[843326.137203] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[843326.149337] Aborting journal on device sda.
[843326.149356] ext3_abort called.
[843326.149358] EXT3-fs error (device sda): ext3_journal_start_sb: Detected aborted journal
[843326.149360] Remounting filesystem read-only


And the most interesting thing of all?
No bad sectors according to smart, which is clearly wrong.

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   199   199   051	Pre-fail  Always	   -	   5629
  3 Spin_Up_Time			0x0003   198   198   021	Pre-fail  Always	   -	   3083
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   22
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   099   099   000	Old_age   Always	   -	   916
 10 Spin_Retry_Count		0x0012   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0012   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   22
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   13
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   22
194 Temperature_Celsius	 0x0022   120   115   000	Old_age   Always	   -	   27
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0012   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0010   200   200   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0


SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	   915		 -
# 2  Extended offline	Completed without error	   00%	   900		 -
# 3  Short offline	   Completed without error	   00%	   899		 -
# 4  Extended offline	Completed without error	   00%	   893		 -
# 5  Short offline	   Completed without error	   00%	   892		 -
# 6  Short offline	   Interrupted (host reset)	  90%	   889		 -
# 7  Extended offline	Completed without error	   00%	   871		 -
# 8  Short offline	   Completed without error	   00%	   870		 -
# 9  Extended offline	Interrupted (host reset)	  30%	   866		 -
#10  Extended offline	Completed without error	   00%	   858		 -
#11  Short offline	   Completed without error	   00%	   857		 -
#12  Short offline	   Completed without error	   00%	   842		 -
#13  Extended offline	Completed without error	   00%	   823		 -
#14  Short offline	   Completed without error	   00%	   822		 -
#15  Short offline	   Completed without error	   00%	   822		 -
#16  Short offline	   Completed without error	   00%	   818		 -
#17  Short offline	   Completed without error	   00%	   794		 -
#18  Short offline	   Completed without error	   00%	   771		 -
#19  Short offline	   Completed without error	   00%	   747		 -
#20  Short offline	   Completed without error	   00%	   723		 -
#21  Extended offline	Completed without error	   00%	   701		 -




Bad sectors:

[841267.274757] end_request: I/O error, dev sda, sector 112543520
[841278.219322] end_request: I/O error, dev sda, sector 539496464
[841290.327943] end_request: I/O error, dev sda, sector 115180888
[841317.473642] end_request: I/O error, dev sda, sector 116934480
[841343.329247] end_request: I/O error, dev sda, sector 541069328
[841383.309632] end_request: I/O error, dev sda, sector 121927680
[841394.428172] end_request: I/O error, dev sda, sector 119869488
[841417.895584] end_request: I/O error, dev sda, sector 123480240
[841451.875687] end_request: I/O error, dev sda, sector 123850312
[841520.063722] end_request: I/O error, dev sda, sector 130176280
[841543.122986] end_request: I/O error, dev sda, sector 132245520
[841603.324793] end_request: I/O error, dev sda, sector 136860808
[841674.873006] end_request: I/O error, dev sda, sector 140783224
[841685.421472] end_request: I/O error, dev sda, sector 138674176
[841707.431138] end_request: I/O error, dev sda, sector 547360784
[841734.654434] end_request: I/O error, dev sda, sector 143324440
[841796.002163] end_request: I/O error, dev sda, sector 146543032
[841807.870695] end_request: I/O error, dev sda, sector 150736968
[841843.411151] end_request: I/O error, dev sda, sector 151716656
[841872.122914] end_request: I/O error, dev sda, sector 153396096
[841924.013988] end_request: I/O error, dev sda, sector 155310608
[841984.347570] end_request: I/O error, dev sda, sector 159002216
[842039.238961] end_request: I/O error, dev sda, sector 164892744
[842122.793866] end_request: I/O error, dev sda, sector 167878344
[842142.793103] end_request: I/O error, dev sda, sector 168296448
[842153.791869] end_request: I/O error, dev sda, sector 136
[842200.960651] end_request: I/O error, dev sda, sector 171962104
[842245.417221] end_request: I/O error, dev sda, sector 174691496
[842259.974116] end_request: I/O error, dev sda, sector 174585760
[842274.639224] end_request: I/O error, dev sda, sector 177053392
[842286.759675] end_request: I/O error, dev sda, sector 177458672
[842301.022746] end_request: I/O error, dev sda, sector 176623352
[842359.364140] end_request: I/O error, dev sda, sector 181149656
[842448.535608] end_request: I/O error, dev sda, sector 190926336
[842521.499963] end_request: I/O error, dev sda, sector 195301448
[842538.816849] end_request: I/O error, dev sda, sector 196720728
[842554.579943] end_request: I/O error, dev sda, sector 198450448
[842590.576004] end_request: I/O error, dev sda, sector 198221184
[842606.621194] end_request: I/O error, dev sda, sector 198429680
[842640.517090] end_request: I/O error, dev sda, sector 204072304
[842662.076440] end_request: I/O error, dev sda, sector 136
[842673.549161] end_request: I/O error, dev sda, sector 207031696
[842694.826365] end_request: I/O error, dev sda, sector 565973008
[842731.578605] end_request: I/O error, dev sda, sector 211822168
[842756.834250] end_request: I/O error, dev sda, sector 213221352
[842872.917110] end_request: I/O error, dev sda, sector 222983952
[842964.248373] end_request: I/O error, dev sda, sector 231349720
[843007.001137] end_request: I/O error, dev sda, sector 235314848
[843018.653874] end_request: I/O error, dev sda, sector 232941816
[843289.816358] end_request: I/O error, dev sda, sector 258151856
[843326.136466] end_request: I/O error, dev sda, sector 292590408

#10 User is offline   datestardi Icon

  • Group: Patron
  • Posts: 313
  • Joined: 03-January 02

Posted 25 November 2008 - 07:16 AM

Jpiszcz, something doesn't seem right with the data you're posting, as if something other than the drives in your system (or the test methodology) is causing the errors.

Can you move the drives to a known working system and retest them?

If you test them with Data Lifeguard (you can use a DOS CD if you're not running Windows), what is the result?

http://support.wdc.c...s...612&lang=en

How are the errors manifesting themselves, other than in your diagnostic tests? (If you're not seeing real-world data errors, that would say a lot I think.)

The customer reviews for the Velociraptor at Newegg are outstanding. I've actually never seen higher.

This post has been edited by datestardi: 25 November 2008 - 07:19 AM


  • (11 Pages)
  • +
  • 1
  • 2
  • 3
  • Last »
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users