Sign in to follow this  
Followers 0
AndyB78

What's wront with this server's HDDs?

4 posts in this topic

Hi,

Since a few days ago when I had to restart one of our shared hosting servers (and failed to restart so the DC techs had to intervene - they've probably ran some manual fsck but I am just not sure what they've run) the IO performance on our server dropped dramatically right after reboot.

The HDDs are setup in Raid 1 behind a LSI MegaRAID 8704ELP Version: 1.20.

The HDD themselves are ST3500320NS.

The raid matrix seems to be in a NOT degraded state (status optimal, no media errors).

CPU is E5520 (quad core w. HT, 8MB cache)

The problem is that IO wait is 5 times bigger than it should (probably more) and iostat shows pretty weird data. For comparison I present 2 identical servers:

        rrqm/s   wrqm/s  r/s  w/s   rsec/s    wsec/s   avgrq-sz avgqu-sz await  svctm  %util
OK one:  62.90    67.33 76.98 53.49  2469.66   985.81    26.48     0.10    3.83   0.22   2.84
Bad one: 12.67    78.06 74.31 49.04  2318.72  1017.19    27.04     2.19   17.73   4.38  54.03

While r/s and w/s are about the same (I believe this means they share a similar utilization) and avgrq-sz is virtually the same, rrqm/s is much lower in the bad system, avgrq-sz is much higher (it gets to about 75 times under higher load) and the await is also much higher (gets to ~ 50 times larger under load) and also service time (svctm).

Also while on the OK server kjournald is very discreet on the bad server kjournald takes the top through 2 different forks (out of 4) even after setting the ionice class to Idle for those 2 kjournald processes.

So what makes rrqm go down and avgqu-sz, await and svctm go up in a bad system? Is it a HDD, is it the card itself, is it some rogue mount option? What is busting the second server?

Thanks in advance for any suggestion!

Edited by AndyB78

Share this post


Link to post
Share on other sites

Is this a linux system ?

ext4 ?

can you paste a gparted print with unit s ?

Hi,

Thanks for answering. Yes, this is a linux system (CloudLinux v5 in fact). It's ext3 not ext4. Having only SSH access to the machine I don't think I can use gparted but I paste below the result of parted:

Model: LSI MegaRAID 8704ELP (scsi)

Disk /dev/sda: 499GB

Sector size (logical/physical): 512B/512B

Partition Table: msdos

Number Start End Size Type File system Flags

1 32.3kB 107MB 107MB primary ext3 boot

2 107MB 367GB 367GB primary ext3

3 367GB 392GB 25.2GB primary linux-swap

4 392GB 499GB 107GB extended

5 392GB 394GB 2196MB logical ext3

6 394GB 499GB 105GB logical ext3

Please let me know if I can better describe anything. Thanks!

Share this post


Link to post
Share on other sites

unit s means sectors, with the idea being to check alignment.

This can be a factor if any of the new drives are 4K

Also with ext filesystems there is a "lazy init" which can run in the background the first time you mount them for a few hours and make things slower unit they finish.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0