|
Velociraptor premature failure rate (bad drives, premature to market?)
I have RMA'd several times so far across 12 disks.

- Member
-
Group:
Member
-
Posts:
472
-
Joined:
15-January 06
Posted 26 November 2008 - 06:58 AM
No, it was the first time I used it.
Justin.
If you would like to remove this advertisement, please become a member of the StorageReview.com forums! Register here.
-
Group:
Patron
-
Posts:
313
-
Joined:
03-January 02
Posted 26 November 2008 - 07:04 AM
jpiszcz, on Nov 26 2008, 06:58 AM, said:
No, it was the first time I used it.
Good. I'd suggest not using it on your other drives, and see if they too seem to be stuck in an "internal offline test."
-
Group:
Patron
-
Posts:
313
-
Joined:
03-January 02
Posted 26 November 2008 - 07:18 AM
P.S. The more things you do to your drives (turning TLER on/off, disabling SMART testing, disabling all relevant offline tests on the disk, disabling hddtemp testing on the disk), the more I think your situation can be affected by "user error".
You should be able to stick with the manufacturer's defaults and test the drives. Obviously, if the drive as manufactured thinks it needs to perform an "offline test", then it probably *needs* to perform the offline test, and you shouldn't be disabling it... same with SMART. WD knows more about the drive than you do, so their defaults are likely much better than your choices... I'm not talking about TLER here. (And I'm only trying to be helpful.)

- Member
-
Group:
Member
-
Posts:
472
-
Joined:
15-January 06
Posted 26 November 2008 - 07:36 AM
datestardi, on Nov 26 2008, 08:18 AM, said:
P.S. The more things you do to your drives (turning TLER on/off, disabling SMART testing, disabling all relevant offline tests on the disk, disabling hddtemp testing on the disk), the more I think your situation can be affected by "user error".
You should be able to stick with the manufacturer's defaults and test the drives. Obviously, if the drive as manufactured thinks it needs to perform an "offline test", then it probably *needs* to perform the offline test, and you shouldn't be disabling it... same with SMART. WD knows more about the drive than you do, so their defaults are likely much better than your choices... I'm not talking about TLER here. (And I'm only trying to be helpful.)
Agree in this case, but I will mention in the past 5-10 years, I've always monitored the disk temperatures, graphed them, performed daily short smart tests and weekly long smart tests and never had any problems with any other type of disks. But as I mentioned earlier, to 100% rule this out I will not do anything special other than use the drives. In addition, I'll be placing them all on a 9650SE controller as well, this will help rule out any further issues as well. The compatibility lists on the 3ware site lists all variants of the velociraptor as compatible with the board. Besides that, so far, no issues with the new disk in RAID1 and no issues with the other disks in RAID6. Although this is typical, there are usually "problems" every 1-2 weeks, so I will have to wait a bit.
Justin.

- Member
-
Group:
Member
-
Posts:
472
-
Joined:
15-January 06
Posted 05 December 2008 - 04:51 AM
I swapped out my power supply, changed ALL cables and bought a $1000 raid controller with BBU, the drives are still having problems, when writing to them in a RAID10 configuration, it locks up the card:
Error 1 occurred at disk power-on lifetime: 3708 hours (154 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 00 00 8d b8 40
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 18 00 6c 91 19 08 00:57:50.230 WRITE FPDMA QUEUED
61 80 18 00 cd b7 19 08 00:57:50.148 WRITE FPDMA QUEUED
61 80 e8 80 cc b7 19 08 00:57:50.147 WRITE FPDMA QUEUED
61 80 18 00 cc b7 19 08 00:57:50.147 WRITE FPDMA QUEUED
61 80 e8 80 cb b7 19 08 00:57:50.146 WRITE FPDMA QUEUED
Latest 3ware BIOS/Firmware etc
DcbMgr::WriteSegment(map=0x4B7E38, segID=0x32, events=20, error=0x0)
DcbMgr::WriteSegment(map=0x4B7E38, segID=0x32, events=20, error=0x0)
DcbMgr::WriteSegment(map=0x4B7E38, segID=0x32, events=20, error=0x0)
DcbMgr::WriteSegment(map=0x4B7E38, segID=0x32, events=20, error=0x0)
E=1019 T=19:57:26 : Drive removed
task file written out : cd dh ch cl sn sc ft
: 61 59 B8 8E 00 80 80
E=1019 T=19:57:26 P=Bh: Hard reset drive
P=Bh: HardResetDriveWait
task file read back : st dh ch cl sn sc er
: 50 00 00 00 01 01 01
E=1019 T=19:57:26 P=B : Soft reset drive
E=0207 T=19:57:26 P=B : ResetDriveWait
E=1019 T=19:57:26 P=B : Inserting Set UDMA command
E=1019 T=19:57:26 P=B : Check power mode, active
E=1019 T=19:57:26 P=B : Check drive swap, same drive
E=1019 T=19:57:26 P=B : Check power cycles, initial=57, current=57
E=1019 T=19:57:26 P=Bh: exitCode = 0
Retrying chain
DcbMgr::WriteSegment(map=0x4B7E38, segID=0x32, events=20, error=0x0)
DcbMgr::WriteSegment(map=0x4B7E38, segID=0x32, events=20, error=0x0)
Hm the last thing I will try I suppose is disabling NCQ and see if the problem recurs.
Justin.
This post has been edited by jpiszcz: 05 December 2008 - 04:52 AM

- Member
-
Group:
Member
-
Posts:
472
-
Joined:
15-January 06
Posted 05 December 2008 - 06:01 AM
I don't want to get too excited yet but after disabling NCQ I was able to write to the RAID10 - over the entire array without it crashing!
I will let it run a few more times before making any further comments though.
writing to raid10
dd: writing `file2': No space left on device
1430328+0 records in
1430327+0 records out
1499806973952 bytes (1.5 TB) copied, 3914.51 s, 383 MB/s
Just as with Linux-- when using NCQ on the drives in RAID (on the 3ware card, it is broken) just as it is when you do the same thing in Linux.
NCQ+Velociraptor => Bad in raid configuration, in non-raid it may be OK (have not tested).

- Member
-
Group:
Member
-
Posts:
472
-
Joined:
15-January 06
Posted 05 December 2008 - 10:17 AM
I spoke to soon, turning off NCQ helped dramatically, it worked three times!
writing to raid10
dd: writing `file2': No space left on device
1430328+0 records in
1430327+0 records out
1499806973952 bytes (1.5 TB) copied, 3914.51 s, 383 MB/s
Fri Dec 5 06:00:25 EST 2008
writing to raid10
dd: writing `file2': No space left on device
1430328+0 records in
1430327+0 records out
1499806973952 bytes (1.5 TB) copied, 4063.25 s, 369 MB/s
Fri Dec 5 07:08:11 EST 2008
writing to raid10
dd: writing `file2': No space left on device
1430328+0 records in
1430327+0 records out
1499806973952 bytes (1.5 TB) copied, 3926.71 s, 382 MB/s
Fri Dec 5 08:13:41 EST 2008
Then it crashed again, with NCQ enabled, it would not even complete one test,
So basically a new system, new PSU, new cables, its on a new APC UPS and the
problem persists even when all disks are on a RAID card, SW raid, it does not
matter, Velociraptors have problems, I think its time for me to get regular
1TiB disks and be done with it.
Justin.

- Member
-
Group:
Member
-
Posts:
492
-
Joined:
23-April 03
Posted 05 December 2008 - 07:45 PM
I don't understand your persistence for running the same OS and hardware on these drives. How hard is it to boot manufacturer's diagnostic utility or MHDD in DOS and do a proper check in another system (or same system if you don't have one)?
And, oh, yes, I just realized that pibibit kibibit tibibit crap is coming with X and GNOME. I haven't used any GUI on linux for so long, i have forgotten they come with it default. Hence the widespread use of crap on the net.
And yeah, file sizes column shows also for files: 5 bytes, 59.2 KB, 1.7 MB. Go figure if you can which one is bigger with a cursory look. I can't believe 15 years passed since we were trying to make GUI boot in Redhat instead of ncurses! Looks like nothing changed. Took me 3 days to make Add/ Remove software work properly to figure out what is installed on my system. Apparently internet connection is needed and it is a royal PITA to make local repository work. Something as simple as typing: rpm -qa|sort|xargs rpm -qil... took me countless hours of readings on GUI to implement. Yes, i am a text guy, GUI is not for me. GNOME looks like Windows 1.1 days anyway (KDE looks cool though). Nevermind my ranting, I just got frustrated with this Fedora 10 crap.

- Member
-
Group:
Member
-
Posts:
472
-
Joined:
15-January 06
Posted 05 December 2008 - 07:50 PM
The drives may work fine in Windows but that is not the OS I need them to work in, it is widely known that the raptors+NCQ are broken in Linux and that is the OS I need to use the disks in.
In any event, I am back on my good old raptor150s for now and the 300s have been put into another system, I will be performing the same testing I did earlier on both systems. I want to see if I can reproduce any of the problems that I had on the 300s with the 150s.

- Member
-
Group:
Member
-
Posts:
492
-
Joined:
23-April 03
Posted 05 December 2008 - 08:15 PM
jpiszcz, on Dec 5 2008, 07:50 PM, said:
The drives may work fine in Windows but that is not the OS I need them to work in. it is widely known that the raptors+NCQ are broken in Linux and that is the OS I need to use the disks in.
I am sorry, you complained about failing drives. Topic title reads: "Velociraptor premature failure rate (bad drives, premature to market?), I have RMA'd several times so far across 12 disks."
So it is not the drives' fault and you replaced disks for nothing. It is not going to make any difference if you try 12 more disks or change 24 more controllers. Better try something else, they are not irreplaceable. And nobody is asking you to change your OS, we were trying to figure out if it is the drives' fault or something elses (proper troubleshooting).
This post has been edited by 6_6_6: 05 December 2008 - 08:16 PM
2 User(s) are reading this topic
0 members, 2 guests, 0 anonymous users
|
|