Jump to content


Photo

What's wront with this server's HDDs?


  • You cannot start a new topic
  • Please log in to reply
3 replies to this topic

#1 AndyB78

AndyB78

    Member

  • Member
  • 3 posts

Posted 06 December 2012 - 07:14 PM

Hi,

Since a few days ago when I had to restart one of our shared hosting servers (and failed to restart so the DC techs had to intervene - they've probably ran some manual fsck but I am just not sure what they've run) the IO performance on our server dropped dramatically right after reboot.

The HDDs are setup in Raid 1 behind a LSI MegaRAID 8704ELP Version: 1.20.
The HDD themselves are ST3500320NS.
The raid matrix seems to be in a NOT degraded state (status optimal, no media errors).
CPU is E5520 (quad core w. HT, 8MB cache)

The problem is that IO wait is 5 times bigger than it should (probably more) and iostat shows pretty weird data. For comparison I present 2 identical servers:

         rrqm/s   wrqm/s  r/s  w/s   rsec/s    wsec/s   avgrq-sz avgqu-sz await  svctm  %util
OK one:  62.90    67.33 76.98 53.49  2469.66   985.81    26.48     0.10    3.83   0.22   2.84
Bad one: 12.67    78.06 74.31 49.04  2318.72  1017.19    27.04     2.19   17.73   4.38  54.03

While r/s and w/s are about the same (I believe this means they share a similar utilization) and avgrq-sz is virtually the same, rrqm/s is much lower in the bad system, avgrq-sz is much higher (it gets to about 75 times under higher load) and the await is also much higher (gets to ~ 50 times larger under load) and also service time (svctm).

Also while on the OK server kjournald is very discreet on the bad server kjournald takes the top through 2 different forks (out of 4) even after setting the ionice class to Idle for those 2 kjournald processes.

So what makes rrqm go down and avgqu-sz, await and svctm go up in a bad system? Is it a HDD, is it the card itself, is it some rogue mount option? What is busting the second server?

Thanks in advance for any suggestion!

Edited by AndyB78, 06 December 2012 - 07:21 PM.

#2 thisperson100

thisperson100

    Member

  • Member
  • 13 posts

Posted 06 December 2012 - 08:46 PM

Is this a linux system ?
ext4 ?
can you paste a gparted print with unit s ?

#3 AndyB78

AndyB78

    Member

  • Member
  • 3 posts

Posted 07 December 2012 - 06:49 AM

Is this a linux system ?
ext4 ?
can you paste a gparted print with unit s ?


Hi,

Thanks for answering. Yes, this is a linux system (CloudLinux v5 in fact). It's ext3 not ext4. Having only SSH access to the machine I don't think I can use gparted but I paste below the result of parted:

Model: LSI MegaRAID 8704ELP (scsi)
Disk /dev/sda: 499GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number Start End Size Type File system Flags
1 32.3kB 107MB 107MB primary ext3 boot
2 107MB 367GB 367GB primary ext3
3 367GB 392GB 25.2GB primary linux-swap
4 392GB 499GB 107GB extended
5 392GB 394GB 2196MB logical ext3
6 394GB 499GB 105GB logical ext3

Please let me know if I can better describe anything. Thanks!

#4 thisperson100

thisperson100

    Member

  • Member
  • 13 posts

Posted 08 December 2012 - 03:44 PM

unit s means sectors, with the idea being to check alignment.
This can be a factor if any of the new drives are 4K

Also with ext filesystems there is a "lazy init" which can run in the background the first time you mount them for a few hours and make things slower unit they finish.



0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users