TRACKER_MAN

ICH10R RAID5 silent data corruption

9 posts in this topic

Hello folks,i want to share with you some bad experience with silent data corruption. It seems that when i copy/create files on the raid5 array on ICH10R, something corrupt files bigger than 300-400MB. i have making some experiments with copy/md5sum checking and the corruption is fact! crying The corruption is ONLY on raid5 volume. on other volumes raid0 and raid1 and external disks everything is O.K. I have no idea what is the reason for that corruption. i don't have OC anything in PC, don't have BSODs, no errors in the logs (only checksum errors respectively broken archives Undecided) NOTHING!I have found this corruption absolutely accidentally.

My PC is Intel Core 2 Duo E8400@3GHz | MB MSI P45 Platinum |

4x2GB A-Data 800MHz@5-5-5-15@1,8V | VGA Asus EAH4670 512MB DDR3 | 2xHDD WD5000AAKS@RAID0+RAID1+3xHDD SAMSUNG HD103UJ RAID5 ICH10R | PSU Spire 600W | DVD+/-RW _NEC ND-3540A DL |OS: Windows XP 64 SP2

Is there someone with this type of problem with ICH. Thanks in advance for your time!

Share this post


Link to post
Share on other sites

I had similar issues with running software raid5 on FreeNAS, after that occurance I've absolutely refused to use a file system that doesn't have the ability to deal with this, hence I have been running ZFS on my file server for several months now.

Share this post


Link to post
Share on other sites
Hello again friends, is there someone who knows how can i run some tests for silent corruption ot ICH chips?

The problem is as you say silent, the only way to detect this is to keep hashes of the file.

Watch the ultimate ZFS tutorial if you would like more information on it. it's long but very interesting.

http://chihungchan.blogspot.com/2008/10/ul...l-in-video.html

Share this post


Link to post
Share on other sites

I have done some tests on hdds,cables,controllers,etc...preinstall windows, changed drivers...and again there is corruption. i am starting to believe that maybe there is some problem with memory management/cache on the windows xp 64 itself...

Share this post


Link to post
Share on other sites

Hello again folks, so i have done more tests with memtest86+ and found no errors, so because i have data corruption and with vista x64, probably the problem is not OS or driver, so, i want to test for silent corruption hard disks. I want to ask you, what tool / program do you recommend me to test samsungs hd103uj and wd5000aaks for silent corruption. Can i use Hutil 210 for samsung drives specially for silent data corruption.Thanks for advices!

Share this post


Link to post
Share on other sites
I had similar issues with running software raid5 on FreeNAS, after that occurance I've absolutely refused to use a file system that doesn't have the ability to deal with this, hence I have been running ZFS on my file server for several months now.

This.

Share this post


Link to post
Share on other sites
Hello folks,i want to share with you some bad experience with silent data corruption. It seems that when i copy/create files on the raid5 array on ICH10R, something corrupt files bigger than 300-400MB. i have making some experiments with copy/md5sum checking and the corruption is fact! crying The corruption is ONLY on raid5 volume. on other volumes raid0 and raid1 and external disks everything is O.K. I have no idea what is the reason for that corruption. i don't have OC anything in PC, don't have BSODs, no errors in the logs (only checksum errors respectively broken archives Undecided) NOTHING!I have found this corruption absolutely accidentally.

My PC is Intel Core 2 Duo E8400@3GHz | MB MSI P45 Platinum |

4x2GB A-Data 800MHz@5-5-5-15@1,8V | VGA Asus EAH4670 512MB DDR3 | 2xHDD WD5000AAKS@RAID0+RAID1+3xHDD SAMSUNG HD103UJ RAID5 ICH10R | PSU Spire 600W | DVD+/-RW _NEC ND-3540A DL |OS: Windows XP 64 SP2

Is there someone with this type of problem with ICH. Thanks in advance for your time!

Some questions:

- how quickly can you reproduce the corruption ?

- how long did you run memtest ? I often like to run only test 1,2 or 5 for hundreds of times, as they are relatively quick and go over all memory (contrary to e.g. random generator based tests, which are rather slow)

- in the past, I've found GoldMemory to be able to find errors far earlier than memtest ever could

- with IntelBurnTest (http://www.xtremesystems.org/forums/showthread.php?t=197835) I managed to find problems, where neither GM, MT or Prime95 were able to find any (in OC scenarios)

- simple copy / compare files in a loop between different disks - and with amount of data making sure that cache wouldn't suffice (i'm not sure how to do equivalent of linux's 1/2/3 >drop_caches under windows) produced interesting results, especially when ran along IBT (again, in OC scenarios)

warning: IBT rapes cpu, literally. You will likely see temperatures you've never seen before.

Edited by Michal Soltys

Share this post


Link to post
Share on other sites

Some questions:

- how quickly can you reproduce the corruption ?

- how long did you run memtest ? I often like to run only test 1,2 or 5 for hundreds of times, as they are relatively quick and go over all memory (contrary to e.g. random generator based tests, which are rather slow)

- in the past, I've found GoldMemory to be able to find errors far earlier than memtest ever could

- with IntelBurnTest (http://www.xtremesystems.org/forums/showthread.php?t=197835) I managed to find problems, where neither GM, MT or Prime95 were able to find any (in OC scenarios)

- simple copy / compare files in a loop between different disks - and with amount of data making sure that cache wouldn't suffice (i'm not sure how to do equivalent of linux's 1/2/3 >drop_caches under windows) produced interesting results, especially when ran along IBT (again, in OC scenarios)

warning: IBT rapes cpu, literally. You will likely see temperatures you've never seen before.

Hello again, i have resolved my problem using another PC with Opensolaris and ZFS.

I don't use anymore ICH10R on my home PC. I am using another integrated controller (Jmicron 36x).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now