Sign in to follow this  
sicker

Samsung IOPS

Recommended Posts

Hi,

I hope someone is able to help me with this issue.

Mainboard: ASUS P9X79
CPU: i7-3930K
RAM: 64GB
OS: Ubuntu 14.04
Drives: two new Samsung Evo 840 1TB
Testool: fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G
--readwrite=randread
Using one onboard SATA3: ~100.000 IOPs (how it should look like)
Using both in parallel onboard SATA3: ~70.000 IOPs each
Using one disk on Adaptec 6405e: ~35.000 IOPs
Using both in parallel on Adaptec 6405e: .. didn't test this
... because LSI says in the tech paper pdf for 9211-8i to reach 320.000 IOPs max we bought one...
Using one disk on 9211-8i: 64.000 IOPs
Using both on 9211-8i: .. didn't test this
... so we've opened a case at LSI support and where told "i think you should get more iops, please update the firmware". Didn't help and they also weren't able to tell us what's the maximum of a controller with a single disk. They just told me that there was a test 8 years ago with a sas backplane where they archived 280.000 iops - no more other information.
.. so we've ordered a LSI 9300-8i and 9300-4i because in the product specs they claim to reach up to 1 mio IOPs for the 9300 series - but not which model.
we've expected that our problem is finally solved, but...
Using one disk on 9300-8i: ~74.000 IOPs
Using both in parallel on 9300-8i: ~74.000 IOPs each
identical result for the 9300-4i.
We have also tried different motherboards (asus, intel, gigabyte) on different systems. Unfortunately still the same result and 74.000 IOPS max.
So any clue what the problem is?
Many Thanks & Kind Regards,
Sicker
Edited by sicker

Share this post


Link to post
Share on other sites

Generally when working with HBAs over on-board Intel SATA, you can sometimes get lower read performance since less caching is involved with the HBA. Have you tried changing size of the test to a larger span or adjusting the queue depth? Also which LSI driver version are you testing with?

Share this post


Link to post
Share on other sites

with a larger block size we reach 520mb/sec without problems, but because our application does a lot of small random reads we need the small blocksize therefore iops.

using a smaller number of iodepth lowers the number of iops - especial after going below 32.

we've used the lsi driver that comes with ubuntu 14.04. are there any alternatives which perform better?

Share this post


Link to post
Share on other sites

Curious - what's the software tool and can you tell us more about the use case? Not exactly relevant to the issue but the background can be helpful.

Share this post


Link to post
Share on other sites

the testing software is fio, which is very famous for IO testing in linux environment.. http://freecode.com/projects/fio

we've used it many times in various environments in the past - always worked like a charm.

The Software which will run on the servers is called mongodb (tokumx fork of it), which is a famous noSQL database. Because of the database and data architecture it will do a lot of random reads of small blocks. We will run 10-20 Servers, so it's a big difference if we get 74.000 or 100.000 IOPs per disk per server.

I've also just finished a benchmark with 6x Samsung Evo 840 500GB connected to one LSI 9300-8i (on ASUS P9X79) and was able to reach a total of 300.000 IOPs (50.000 per Disk) reading from all disks in parallel.

I've repeated the same with the LSI 9211-8i and got only 40.000 per Disk - so 240.000 total.

I wonder how LSI claims to reach 1 Mio IOPs with the 9300 Series.

Share this post


Link to post
Share on other sites

I've personally hit 1MM IOPS on the 9300-8e connected to a JBOD filled with some HGST SSD800MM SAS3 SSDs. It really depends on the profiles attacking them though. I seem to recall I had better luck with sequential vs random IOPS.

With FIO (we use it extensively here) there can be a huge difference between its results and those of an actual database run. Have you done any initial testing to see how mongo responds to the different HBA?

Share this post


Link to post
Share on other sites

I did a test with mongodb on a single disk connected to the onboard vs 9211-8i and a single query already took 0.4s longer than on the onboard sata3 - and that's a lot if we run thousands of queries per hour.

I just wonder why a onboard sata3 port of a $250 desktop board is able to reach the 100.000 IOPs for a single disk, but not the $300 LSI 9300-8i HBA. Seems that a controller card only reach more IOPs if used with multiple disks. But even then the IOPs went down from 74.000 for a single disk to 50.000 for six disks. Sounds not logical to me.

Share this post


Link to post
Share on other sites

I'd try updating the driver and verifying the SATA device parameters (outstanding queue, noop, etc) for the devices you are testing. Ubuntu might be presenting the HBA devices in a different manner than onboard storage.

As to drivers, check for the latest depending on your distribution on the LSI 9300 product page:

http://www.lsi.com/products/host-bus-adapters/pages/lsi-sas-9300-8i.aspx#tab/tab4

Share this post


Link to post
Share on other sites

I've just updated to the current lsi driver - didn't change anything - still 74.000 max.

I also compared the queue parameters of the kernel:

nomerges:0
logical_block_size:512
rq_affinity:1
discard_zeroes_data:0 (onboard SATA3 has 1)
max_segments:128 (onboard SATA3 has 168)
max_segment_size:65536
rotational:0
scheduler:noop [deadline] cfq
read_ahead_kb:128
max_hw_sectors_kb:16383 (onboard SATA3 has 32767)
discard_granularity:0 (onboard SATA3 has 512)
discard_max_bytes:0 (onboard SATA3 has 2147450880)
write_same_max_bytes:0
max_integrity_segments:0
max_sectors_kb:512
physical_block_size:512
add_random:1
nr_requests:128
minimum_io_size:512
hw_sector_size:512
optimal_io_size:0
iostats:1

regarding to the documentation ( https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt ) the different settings are readonly values.

Share this post


Link to post
Share on other sites

Switch the scheduler to noop and away from deadline. The rotational part is already correct.

Heres a quick example... note it needs to be done to each block device presented from the LSI card.

# turn off seek re-ordering
echo "0" > /sys/block/sdX/queue/rotational

# set I/O scheduler to noop
echo "noop" > /sys/block/sdX/queue/scheduler

Share this post


Link to post
Share on other sites

with hdparm I've discovered that the local connected disk has

R/W multiple sector transfer: Max = 16 Current = 16

while LSI 3900-8i connected disks have

R/W multiple sector transfer: Max = 1 Current = 1

unfortunately I don't know how to change that setting.. -m16 didn't work.

Also I don't know if this could be the reason for the 74.000

[update]

i just changed it to 1 for the SATA3 Disk and still get 100.000 IOPs...

Edited by sicker

Share this post


Link to post
Share on other sites

Looking over some of our past SATA SSD runs, it looks like ~74k is our 4K read limit as well. We use the 9271-8i in our test bed with CentOS 6.3.

http://www.storagereview.com/images/toshiba_hk3r2_960gb_main_4kwrite_throughput.png

Run through some of these tweaks to see if any one item has the boost:

http://mycusthelp.info/LSI/_cs/AnswerDetail.aspx?inc=8196

Share this post


Link to post
Share on other sites

next time you maybe should use a onboard SATA3 port to get the real limit of each SSD :-)

so it seems that there is really a hard limit of 74.000 per disk attached to a controller.. not only in our setup. wondering what's the physical reason for that...

I've just tried all the settings LSI suggests, but nothing changed :-(

Share this post


Link to post
Share on other sites

Well the strange part is, Windows doesn't really have that issue... its driver related.

Share this post


Link to post
Share on other sites

hmm.. wired...

I've just tested with 1kb blocksize and got 106 IOPs ...

very strange...

running windows is no option for us .. and I can't believe that windows performs better with storage adapters than linux ..

Share this post


Link to post
Share on other sites

with sequential 4k blocks and (the LSI suggested) iodepth 975 I reach 121.000 IOPs ... but still 74.000 with randread...

with iodepth 64 sequential I get 95.000...

Share this post


Link to post
Share on other sites

few new numbers from the 9300-8i with six Samsung EVO 840 500GB SSDs..

(maybe they will also help somebody else who reads this topic :-)

using lvm (default settings) raid 0 with two disks

randread 4kb, iodepth: 74.000

... after disovering that lvm performs that bad (our well known 74.000?!) I switched to testing mdadm ...

using mdadm (default settings) raid 0 with two disks

mdadm --create --verbose /dev/md0 --level=stripe --chunk=128 --raid-devices=2 /dev/sdb /dev/sdc

randread 4kb, iodepth 64: 113.000

randwrite 4kb, iodepth 64: 105.000

using mdadm (default settings) raid 0 with six disks
mdadm --create --verbose /dev/md0 --level=stripe --chunk=128 --raid-devices=6 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg

randread 4kb, iodepth 64: 162.000

randwrite 4kb, iodepth 64: 157.000

using mdadm (default settings) raid 1 (mirror) with two disks

mdadm --create --verbose /dev/md0 --level=mirror --chunk=128 --raid-devices=2 /dev/sdb /dev/sdc

randread 4kb, iodepth 64: 144.000

randwrite 4kb, iodepth 64: 60.000

for some wired reasons raid 1 performs better than raid 0 with reads ... would never expect that...

and finally a test running fio direct on all 6 devices in parallel

randread 4kb, iodepth 64: 440.000 (70.000 each) <- winner! :)

Edited by sicker

Share this post


Link to post
Share on other sites

i've just tested an identical (second) system with a clean ubuntu 14.04 and it seems that the mentioned performance can only be reached with the kernel module from lsi (mpt3sas-7.00.00.00-1_Ubuntu14.04.amd64.deb). Because it's precompiled for 3.13.0-24-generic, it's required to install the 3.13.0-24 kernel (apt-get install linux-image-3.13.0-24-generic linux-headers-3.13.0-24-generic)

just if somebody runs into the same issues ;)

Share this post


Link to post
Share on other sites

it's still the 74.000 for a single disk instead of 100.000 iops onboard SATA3.

combining all 6 disks I get 70.000 per disk - so there is still some kind of bottleneck which costs me 30% per disk.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this