mikesm

Bad Block errors on adaptec controller

Recommended Posts

Hi. I have a new adaptec 31605 controller that has 16 1 TB disks hooked to it. I am in the process of initializing 2 8 disk RAID6 arrays, and it's in the build and verify process now while windows server 2008 is doing formats on each logical volume.

I am seeing several warning messages on the adaptec storage manager telling me that it's discovered a bad block on the controller, but with no deatils on which array or which disk the block is tied to. Just a hex block number like 1ab8200. Opening up the properties menu on each drive shows no errors or smart warnings, so I can't seem to figure out which disk or disks are causing the problem. No user data has been placed on the array yet, so if there is a disk that has issues, I should would like to pull it now before I place the arrays into service.

Does anyone know how get more info on where the faults are coming from? There is no mention of this in the adaptec documentation, and it's a lot less data than what I got from smartctl when I was running linux software raid.

thx

mike

Share this post


Link to post
Share on other sites

unfortunately I'm not too familiar with the Adaptec McRAID...

Are you writing down the block addresses at least? The drives don't show verify/read/write wrrors? (both recovered w/o delay, recovered w/delay, etc...?)

Share this post


Link to post
Share on other sites
unfortunately I'm not too familiar with the Adaptec McRAID...

Are you writing down the block addresses at least? The drives don't show verify/read/write wrrors? (both recovered w/o delay, recovered w/delay, etc...?)

These messages are coming from the adaptec storage manager console, they aren't system messages. I have them all listed in the manager's log files. But there is no detail, just a block number. I assume it would fail the drive if the error were not recoverable.

Is there really no way to get more detailed info about these errors?

Share this post


Link to post
Share on other sites

There probably is? But as I said, I'm not too familiar with it... :-/

I know with 3wares our customers usually allow a few block errors when building under mass production, 'cause we run extensive burn-in on the array afterwards to help weed out possible failures, but for home/single-system-production, I'm not sure how concerned I would be about that? It's admittedly harder to test with a usable result...

Share this post


Link to post
Share on other sites

this errormsg is absolutly unuseable.

they must have been on crack or something while programming this part of the software

this error says that one drive has a bad block

if you want details, just go to the ICP Storage Manager eg. C:\Program Files\ICP\ICP Storage Manager

and open RaidEvtA.log

it says something like

15. Mai 2009 02:36:22 CEST INF srv1 Sense data: Medium error (READ RETRIES EXHAUSTED). Controller 1, channel 0, SCSI device ID 14, LUN 0, cdb [2f 00 08 b6 48 00 00 08 00 00 00 00], data [70 00 03 00 00 00 00 00 00 00 00 00 11 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00]

15. Mai 2009 02:36:22 CEST INF srv1 Sense data: Medium error (READ RETRIES EXHAUSTED). Controller 1, channel 0, SCSI device ID 14, LUN 0, cdb [2f 00 08 b6 48 00 00 08 00 00 00 00], data [70 00 03 00 00 00 00 00 00 00 00 00 11 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00]

15. Mai 2009 02:36:22 CEST WRN srv1 Medium error: controller 1, channel 0, SCSI device ID 14, LUN 0, start LBA 8b64800, end LBA 8b64fff, bad block recovery possible

15. Mai 2009 02:36:22 CEST WRN 418:A01C-S--L-- srv1 Bad Block discovered: controller 1 (8b64800).

Share this post


Link to post
Share on other sites
this errormsg is absolutly unuseable.

...

this error says that one drive has a bad block

if you want details, just go to the ICP Storage Manager eg. C:\Program Files\ICP\ICP Storage Manager

and open RaidEvtA.log

Thank you! You saved my evening ^^ I was trying to figure out where, in /var/log, I could gather infos on linux.

With your hint, i did a "locate RaidEvtA" and found out i needed to look in /usr/StorMan/RaidEvtA.log :)

(on a Debian Etch install, apparently done via "alien -cv asm_linux_x86_v6_10_18359.rpm ; dpkg -i storman_6.10-18360_i386.deb

; aptitude install libstdc++5", in case someone needs storman on linux debian)

Thanks a lot :)

Actually, it's been a looong time i didn't checked that site (being busy and all) but I know this is one of the best place on the net to get infos about storage (Eugene, thank you for this!)

olivier / EdhelDil.

Share this post


Link to post
Share on other sites

Hi there!

Raid controllers do not generate errors for fun. They rather report what a drive reported.

Bad blocks are always possible, even in flash based devices.

The controller can 'repair' the bad block by asking the drive to reassign this block to one of a pool of spare blocks.

Once the pool of spare blocks is mostly used up, the drive SMART threshold exceeded error will show up, not before that.

You should write the logical drives created with data and read back several times to make sure all the bad blocks are reassigned before having user data on them... And I would then monitor subsequent errors closely...

Just my 2 cents...

Good luck!

MEJV

Share this post


Link to post
Share on other sites

The Adaptec UI tools need some serious work. I had a similar thing - it was sending alerts about bad blocks, except not saying what drive they were on! How dumb is that.

The menu layouts are ridiculous too.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now