Jump to content


Photo

Poor random read/write performance with MegaRAID controllers and SSDs


  • You cannot start a new topic
  • Please log in to reply
14 replies to this topic

#1 lunadesign

lunadesign

    Member

  • Member
  • 148 posts

Posted 03 August 2013 - 03:05 PM

I'm seeing really slow random read/write performance on SSDs connected to my LSI 9260-8i controllers. It almost seems like the card is a bottleneck.

I tested a *single* Plextor M5 Pro SSD:
1) Connected to the onboard SATA III controller on my Supermicro X9SRE-F
2) Connected to the LSI 9260-8i (using a PCIe 3.0 slot on the same mobo) in a RAID 0 with no read-ahead or write caching

(The idea was to make the LSI card act like a non-RAID controller so its roughly comparable to onboard. Also, before each test run, I performed a secure erase on the SSD.)

I ran a bunch of Iometer tests at queue depths ranging from 1-32 and found:
* 4K Random Read = MegaRAID IOPS and MB/s are 43-73% lower than onboard
* 4K Random Write = MegaRAID IOPS and MB/s are 41-63% lower than onboard

(I also did some 2M sequential read/write testing and found the LSI sees roughly the same performance as onboard except that at QD1 the LSI is 11-13% slower than onboard).

I've tried two LSI cards, multiple LSI driver and firmware versions and a different computer and see roughly the same results each time.

I Googled around and found some reviews involving SSDs and MegaRAID controllers (even the newer faster ones) and they all seem to show random read/write performance that is slower than what I would expect to see with a single SSD.

I'm not looking for extreme performance. My intent is to use RAID 1 / RAID 10 arrays primarily for redundancy reasons and was hoping to see performance that's comparable to single SSD performance or better.

Is this slow random read/write performance normal? Or am I doing something wrong?

#2 lunadesign

lunadesign

    Member

  • Member
  • 148 posts

Posted 06 August 2013 - 01:16 PM

Does anybody have any thoughts on this?

Better yet....does anyone have an LSI MegaRAID card and a single SSD (any recent SSD) they would be willing to run some tests on?

I've attached the Iometer 1.1.0 RC1 input file so its really easy to test. (Note: I had to change the file extension from .icf to .txt to attach it here)

Attached Files


#3 Kevin OBrien

Kevin OBrien

    StorageReview Editor

  • Admin
  • 1,439 posts

Posted 06 August 2013 - 01:33 PM

I wouldn't say thats too uncommon or something I'd be surprised to see. The overhead of RAID compared to a single SSD in JBOD will always have the SSD in JBOD being faster. Its one of the main reasons we use standard HBAs for all of our performance testing. For a better comparison, I'd keep read ahead enabled but switch the mode to write-through and interface to direct-io. Disabling read-ahead most likely hurt some performance additionally, since even Intel SATA will have those improvements.

Another part to consider is that RAID card is best setup for SSDs when using FastPath. Its a pay-feature and I'm not sure how much it would help for small SSD groups, but that might be coming into play for this comparison.

#4 lunadesign

lunadesign

    Member

  • Member
  • 148 posts

Posted 06 August 2013 - 07:02 PM

I wouldn't say thats too uncommon or something I'd be surprised to see. The overhead of RAID compared to a single SSD in JBOD will always have the SSD in JBOD being faster. Its one of the main reasons we use standard HBAs for all of our performance testing. For a better comparison, I'd keep read ahead enabled but switch the mode to write-through and interface to direct-io. Disabling read-ahead most likely hurt some performance additionally, since even Intel SATA will have those improvements.

Another part to consider is that RAID card is best setup for SSDs when using FastPath. Its a pay-feature and I'm not sure how much it would help for small SSD groups, but that might be coming into play for this comparison.

Thanks for your response, Kevin!

This particular card unfortunately doesn't have a JBOD mode. However, my understanding was that a single drive RAID 0 would be very similar because there are no other drives to distribute the data to so there shouldn't be any calculations or other overhead.

I wasn't aware that the onboard Intel SATA used read ahead....I'll give that a try to see if it improves my read numbers. However, that wouldn't explain the poor write numbers.

FastPath came to mind but my understanding was that its benefit is limited for small numbers of drives.

#5 lunadesign

lunadesign

    Member

  • Member
  • 148 posts

Posted 07 August 2013 - 02:13 AM

Kevin,

It just occurred to me....when you said Intel SATA used read-ahead, is this enabled in AHCI mode? That's what I've been using for my baseline comparison, not Intel onboard RAID.

However, just for yucks, I tried read-ahead on the LSI and my random reads (and writes - not sure why) all came in a fair amount *slower* than before. In fact, sequential read at QD1 was about the only number that went up (slightly) with read-ahead enabled.

#6 tugaricardo

tugaricardo

    Member

  • Member
  • 5 posts

Posted 30 August 2013 - 05:14 PM

Kevin,

It just occurred to me....when you said Intel SATA used read-ahead, is this enabled in AHCI mode? That's what I've been using for my baseline comparison, not Intel onboard RAID.

However, just for yucks, I tried read-ahead on the LSI and my random reads (and writes - not sure why) all came in a fair amount *slower* than before. In fact, sequential read at QD1 was about the only number that went up (slightly) with read-ahead enabled.


Hi lunadesign,
Don't trust on the most benchmarks. Ok, some of them like PC mark or even winsat disk can give you an idea about the performance but not entirely the system (Raid controller + SSD). The majority of benchmarks can't see with eficiency the "group Controller HBA + SSD". Mostly benchmarks are designed to perform in a direct IO interface with SSD only, so we should have carefull to see an specific case and judge it. Hardware Raid Controllers will always have more performance than any onboard solution including HBA (without cache). The cache is the star of performance specially if you have read ahead enabled because this functions allows to force random 4K QD1 on higher QD's no matter the benchmark you use (if the file has the same or shorter size than cache size). The crystal disk mark is very good to show your performance whith file size smaller than your cache. You will see 4K QD1 at +100mb/s when your SSD can't perform higher than 25mb/s.This is true for 3ware cards, Adaptec and LSI. The exception is Areca wich puts every random type files as a buffer (very likely for entusiast guys). So, do a real world bench. Real random read : Windows Defender - scan your drive C: note the enterely size of C: and see how much time can scan it. You will see huge numbers. My adaptec 6405 with 1x Samsung 830 256Gb with 178GB C: size can scan in 02m:37s, wich means = 1133mb/s (there are some sequential files on C) so the read ahead is a big challenger here. So don't worry about the syntetic benchmark numbers literally.

#7 lunadesign

lunadesign

    Member

  • Member
  • 148 posts

Posted 08 September 2013 - 09:26 PM

Hi lunadesign,
Don't trust on the most benchmarks. Ok, some of them like PC mark or even winsat disk can give you an idea about the performance but not entirely the system (Raid controller + SSD). The majority of benchmarks can't see with eficiency the "group Controller HBA + SSD". Mostly benchmarks are designed to perform in a direct IO interface with SSD only, so we should have carefull to see an specific case and judge it. Hardware Raid Controllers will always have more performance than any onboard solution including HBA (without cache). The cache is the star of performance specially if you have read ahead enabled because this functions allows to force random 4K QD1 on higher QD's no matter the benchmark you use (if the file has the same or shorter size than cache size). The crystal disk mark is very good to show your performance whith file size smaller than your cache. You will see 4K QD1 at +100mb/s when your SSD can't perform higher than 25mb/s.This is true for 3ware cards, Adaptec and LSI. The exception is Areca wich puts every random type files as a buffer (very likely for entusiast guys). So, do a real world bench. Real random read : Windows Defender - scan your drive C: note the enterely size of C: and see how much time can scan it. You will see huge numbers. My adaptec 6405 with 1x Samsung 830 256Gb with 178GB C: size can scan in 02m:37s, wich means = 1133mb/s (there are some sequential files on C) so the read ahead is a big challenger here. So don't worry about the syntetic benchmark numbers literally.

Hi tugaricardo,

Thanks for your thoughtful resposne.

I'm not sure I'd be so quick to throw out all the synthetic benchmarks. I realize the workloads introduced by the benchmarks are not necessarily representative of how the storage subsystem will be used. However, I'm not aware of any benchmarks doing anything special with their IO requests that would give onboard controllers an advantage over RAID controllers or vice versa. My assumption is that most (if not all) of these benchmarks are making their requests in ways similar to how the OS would or else the benchmarks would be completely invalid. Iometer in particular appears to be highly respected but I also use a few other benchmarks to make sure I have enough data points to trust the general result trend.

My understanding is that read ahead and write-back are good for certain workloads but not all. In fact, if you look at LSI's Benchmarking Tips doc, they recommend turning both on for "streaming" tests and both off for "transactional" tests. In testing with various benchmarks, I've found several situations where read ahead and/or write-back can slow things down.

With regards to the CrystalDiskMark 4KB QD1 tests, the read test definitely benefits from read ahead. However, the write tests take a hit if read ahead (not sure why) or write-back or both are enabled.

#8 lunadesign

lunadesign

    Member

  • Member
  • 148 posts

Posted 08 September 2013 - 09:46 PM

Testing Update

I was able to borrow a 9271-8i and re-ran my Iometer tests and found that this card performed substantially better than the 9260-8i. However, it still lagged onboard somewhat in random reads and random writes.

4K Random Read:
- 9260-8i is 43-73% less IOPS than onboard
- 9271-8i is 0-24% less IOPS than onboard

4K Random Write:
- 9260-8i is 41-63% less IOPS than onboard
- 9271-8i is 0-30% less IOPS than onboard

In the process, I *think* I discovered a PCI Express compatibility problem between the 9260 and the Supermicro system's PCIe 2.0 and 3.0 slots. I noticed that when I ran HD Tune Pro's read test, the speed is almost perfectly constant throughout the test when using the onboard controller or the 9271. With the 9260, the graphed line was pretty jaggy and generally lower than on the others. A PCI Express incompatibility might explain why the 9260 was somewhat faster on an much older X48-based system.

I'm still not sure why the 9271 is slower than onboard as the processor isn't doing any complicated RAID calculations in a single drive RAID 0.

I'm curious if Adaptec and/or Areca cards have a similar performance lag.....

#9 Axl

Axl

    Lifelong Reader

  • Patron
  • 89 posts

Posted 17 September 2013 - 07:39 PM

In the process, I *think* I discovered a PCI Express compatibility problem between the 9260 and the Supermicro system's PCIe 2.0 and 3.0 slots.


Speaking of PCI Express incompatibilities, have you tried your 9260-8i in a PCIE 2.0 slot rather than a 3.0 slot? I've heard of some LSI cards and certain motherboards not getting along when used in PCIE slots they aren't explicitly rated for. I know, there should be backward-compatibility, but apparently there ARE issues in certain cases.

I have a 9266-4i which has not gone into service yet. I would be willing to hook up an SSD (Kingston HyperX 3K 120GB is available) and run some tests. I downloaded your IOMeter settings and I'll give that a try in the next couple days. The 9266 compared to 9260 bumps up the LSISAS2108 to a LSISAS2208 and bumps the 512MB of DDR-2 to 1GB of DDR3. But it's still PCIE 2.0. I chose this on purpose, as all I have are AMD systems with no PCIE 3.0, and I didn't want to risk the compatibility issues I've read about when mixing PCIE generations (only applicable to LSI, as far as I've read).

#10 lunadesign

lunadesign

    Member

  • Member
  • 148 posts

Posted 19 September 2013 - 12:40 AM

Speaking of PCI Express incompatibilities, have you tried your 9260-8i in a PCIE 2.0 slot rather than a 3.0 slot? I've heard of some LSI cards and certain motherboards not getting along when used in PCIE slots they aren't explicitly rated for. I know, there should be backward-compatibility, but apparently there ARE issues in certain cases.

Yes, I tried the 9260 (actually two different 9260's) in the same motherboard's PCIE 2.0 slot. I also tried it on both PCIE 3.0 slots (one is x8, the other is x16) but with the BIOS set to use "GEN2" instead of "GEN3". In each case, I saw the problem I described in my previous post.

I also tried the 9260 on an older, slower 5 year old X48 system that's PCIE 2.0 only and saw *better* random read/write performance but still significantly slower than onboard. Unfortunately, I don't have any relatively recent PCIE 2.0 only systems available to test on.

I have a 9266-4i which has not gone into service yet. I would be willing to hook up an SSD (Kingston HyperX 3K 120GB is available) and run some tests. I downloaded your IOMeter settings and I'll give that a try in the next couple days. The 9266 compared to 9260 bumps up the LSISAS2108 to a LSISAS2208 and bumps the 512MB of DDR-2 to 1GB of DDR3. But it's still PCIE 2.0. I chose this on purpose, as all I have are AMD systems with no PCIE 3.0, and I didn't want to risk the compatibility issues I've read about when mixing PCIE generations (only applicable to LSI, as far as I've read).

I'd be very interested to see what kind of numbers you get, especially when you compare versus your onboard controller (since your SSD is different than mine). The 9266 appears to have the same proc/memory as the 9271 so I suspect you'll see similar results.

#11 pyite

pyite

    Member

  • Member
  • 5 posts

Posted 30 October 2013 - 06:23 PM

I am having similar issues and could use some suggestions.  My server is a Dell 720xd with the most expensive RAID option (H710P which is an LSI with a 2208 chip).  I am using  Micron P300 SSD's (200GB SLC "enterprise" SSD's, quite speedy).

 

To make the math easy, let's say that read performance with one SSD was like 300MB/sec and write was 200MB/sec... it was in that ballpark.

 

When I put five of them in a RAID0 config, I get around what I would expect for read performance - 1.5GB/sec or so.  However, writes are not much faster.  The application is VM disk images where dozens of VM's are each unzipping a large zip file with source code (i.e. small writes).  This is a very easy test to duplicate.

 

I have spent a huge amount of time tweaking filesystems and operating system parameters and it appears to me that the RAID controller is the bottleneck.  I have two questions about this:

 

1) What is the fastest LSI controller that can handle 8 SLC SSD's and isn't insanely expensive... $1500 would be about the most I would consider spending

 

2) Are there any Linux-based tools to analyze what is going on with the 2208 RAID chip?  When it is busy, the Soft IRQ area is approaching 100% which indicates that the driver is very busy, but is there a way to find the core bottleneck apart from a whole lot of mind-numbing A/B tests with different drives, controllers, servers, etc etc?

 

Thanks!

Mark

 

 

 


#12 Kevin OBrien

Kevin OBrien

    StorageReview Editor

  • Admin
  • 1,439 posts

Posted 30 October 2013 - 07:51 PM

I don't want to hurt any feelings, but the Micron P300 SSDs which speedy for their time, have been quickly outpaced by the past few generations of SATA and SAS SSDs hitting the market. Most of the current top-end RAID cards are designed to handle much, much more I/O from 16-24 SSDs connected to a single card, so finding one to let your 8 P300s run wild shouldn't be a problem. 

 

Now that said, the 2208 is at the heart of many quick flash platforms on the market right now, including a few PCIe SSDs. What OS are you running? I'm guessing Linux from your analyzer question, but just want to make sure. If you rolled your own OS from scratch, have you made any OS optimizations? Queue depth, queue schedulers, etc? Those can have a massive impact in Linux, whereas in Windows the default settings are much more geared towards high performance storage. 

 

Beyond that have you ever gone back to scratch on those SSDs by secure erasing them just to rule out the SSDs being the items bogging the platform down?


#13 pyite

pyite

    Member

  • Member
  • 5 posts

Posted 31 October 2013 - 02:01 PM

Thanks for the reply, Kevin.  Yes, it is Linux (el6) and I have done a depressing amount of OS optimizations.

 

Performance with individual SSD's connected to the same controller is good, write performance just doesn't scale at all when I make it RAID0 and add more disks.  Both ext4 and XFS showed similar performance issues.

 

Thanks,

Mark


#14 Kevin OBrien

Kevin OBrien

    StorageReview Editor

  • Admin
  • 1,439 posts

Posted 31 October 2013 - 03:05 PM

What is your current queue scheduler?


#15 tugaricardo

tugaricardo

    Member

  • Member
  • 5 posts

Posted 03 November 2013 - 02:58 PM

Yes, I tried the 9260 (actually two different 9260's) in the same motherboard's PCIE 2.0 slot. I also tried it on both PCIE 3.0 slots (one is x8, the other is x16) but with the BIOS set to use "GEN2" instead of "GEN3". In each case, I saw the problem I described in my previous post.

I also tried the 9260 on an older, slower 5 year old X48 system that's PCIE 2.0 only and saw *better* random read/write performance but still significantly slower than onboard. Unfortunately, I don't have any relatively recent PCIE 2.0 only systems available to test on.


I'd be very interested to see what kind of numbers you get, especially when you compare versus your onboard controller (since your SSD is different than mine). The 9266 appears to have the same proc/memory as the 9271 so I suspect you'll see similar results.


Hello my friend
My home system is still a "old" X58 platform and when I choose the motherboard, had carefull to see the IRQ table to confirm a PCI-E X16 slot without sharing any IRQ to another slot. This is in my opinion, the gold rule to start. Its not easy to find a single cpu socket board with 1 pci-e without sharing resources. Mobos with dual socket are golden of course or, the LGA 2011 desktop platform is more server friendly than previous ones.
Im speaking just on windows environment, and in my experience with adaptec raid card, up to cache size they put 4k QD1 as a buffer. My series 6 card is "slow" but benchmarked it with randomly windows test files, for example, installing a language pack that consists a lot random read - write, performed a lot faster than onboard (and tested also on newly Amd 990FX platform too).
Since I'm using now 2x samsung ssd raid 0 and 1 hdd drive for data), decided to turn off the read and write cache for hdd. I saw a improvement on performance to ssd's because it allow more cache space to them.
The series 7 is 4 -10x faster than my 6 and saw it...160mb/s 4k QD1...almost on Areca level and the new series 8 is 60% faster than 7 (i believe on higher QD's).
Yes, depending of workload types you can test with/out cache but the manufacturers put the default settings with caches enabled because they work with the best of both worlds (random and sequencial with read ahead and write back)



2 user(s) are reading this topic

0 members, 2 guests, 0 anonymous users