I've been using 3ware cards here and there since the first escalade cards were available.
Mainline linux drivers is a nice thing, and mostly the firmware and such is of satisfactory quality. Lets not get too carried away, they generally look good because the rest of the field is so poor.
Performance has always been poor "feeling", on any card. Even when the 9500 killer card was new, and i had this setup with 8 disks, things were not great, despite benchmarks which demonstrate massive speeds.
It's late so i'll cut to the meat.
Most people want to get better performance, and quick googling gets you to the blockdev --setra stuff. Great. So you have a massive readahead. Now you perform some DD, or bonnie++ or whatever VFS-layer operation you want, and get massive read speeds. 200, 300MB sec.
Now do some bonnie++ or dd tests for writes and see some big output. Ok great, you can fill your pagecache and linux can async write stuff in the background as long as you have memory. Depending on the ratio of free pages to disk speed you'll see some nice numbers.
But not much of any of that is of any use, unless you're purely in the business of shuffling around huge datas. And if you fill your page cache with dirty pages you'll start to see a sluggish system since the queue is deep and IO starts to block in other places -- now even the mp3 you were streaming at the same time will be in trouble.
So specs sell. And if people see 300MB sec read/write in DD you'll have the market.
Ok so enough of all that. Filesystem operations occur in 4K blocksize. And most applications do not perform async IO. Maybe postgresql, MSSQL and some smart apps like that.
Imagine...
while ( 1 ) {
c = fgetc( f )
do something with c;
}
this operation will perform an IO for each iteration, so we must be able to perform it -with the lowest latency- possible. Read-ahead 16384 in a multi-process environment? That's a big overhead for these small reads, certainly detrimental. But this type of IO pattern is happening all of the time.
Some IO systems like DRBD will perform only in 4K blocksize and with a full write sync at the same time, so this latency is critical.
Any performance gains from readahead, or async-pagecache-writes are purely a function of linux, RAM, and spindles. 3ware makes no difference here.
Now lets reveal how poor 3ware's latency is, and reevaluate all those times where we wondered -what was going on- ??
BTW these tests here are on two similar boxes. Server class boards with 2ghz SMP cpus, 1GB ram. One with 3Ware 9650SE and 14 * 500GB Raid5. The other is Areca 1261-ML with 14 * 500GB Raid5.
This issue is easy to demonstrate on previous 3ware models as well, although i do not have some setup currently to run similar comparisons.
Using the (great) linux test project (LTP) ' disktest ' we can test with pure block-io (bypass pagecache) at 4K, any number of threads, and even tweak the range of sectors. Random and Linear seek pattern is also possible.
root@storage2 ~ # disktest -B 4k -I BD -K 1 -p l -P A -T 15 -s 0:1000000000 /dev/sde | 2007/08/31-02:15:39 | START | 7300 | v1.2.8 | /dev/sde | Start args: -B 4k -I BD -K 1 -p l -P A -T 15 -s 0:1000000000 (-N 1000000001) (-r) (-c) (-p u) | 2007/08/31-02:15:39 | INFO | 7300 | v1.2.8 | /dev/sde | Starting pass | 2007/08/31-02:15:53 | STAT | 7300 | v1.2.8 | /dev/sde | 327397376 bytes read in 79931 transfers. | 2007/08/31-02:15:53 | STAT | 7300 | v1.2.8 | /dev/sde | Read throughput: 23385526.9B/s (22.30MB/s), IOPS 5709.4/s. | 2007/08/31-02:15:53 | STAT | 7300 | v1.2.8 | /dev/sde | Read Time: 14 seconds (0h0m14s) | 2007/08/31-02:15:53 | STAT | 7300 | v1.2.8 | /dev/sde | Total bytes read in 79931 transfers: 327397376 | 2007/08/31-02:15:53 | STAT | 7300 | v1.2.8 | /dev/sde | Total read throughput: 23385526.9B/s (22.30MB/s), IOPS 5709.4/s. | 2007/08/31-02:15:53 | STAT | 7300 | v1.2.8 | /dev/sde | Total Read Time: 14 seconds (0d0h0m14s) | 2007/08/31-02:15:53 | STAT | 7300 | v1.2.8 | /dev/sde | Total overall runtime: 15 seconds (0d0h0m15s) | 2007/08/31-02:15:53 | END | 7300 | v1.2.8 | /dev/sde | Test Done (Passed) root@storage2 ~ #
Ok so 4K blocksize, single thread, linear read (the -s sector range setting is because this 6TB volume is too big for disktest to handle) and we have 5709.4 IOPS !?
A single decent 7200 rpm SATA should get at least 6000 by itself.
Ok, but without reading ahead, we really can't effectively use all these spindles anyway.
Lets remove the disks completly, and just test round-trip to the 3ware card.
By setting sector limit to -s 0:8, we'll just be reading the same 4k block over and over.
root@storage2 ~ # disktest -B 4k -I BD -K 1 -p l -P A -T 15 -s 0:8 /dev/sde | 2007/08/31-02:19:56 | START | 7308 | v1.2.8 | /dev/sde | Start args: -B 4k -I BD -K 1 -p l -P A -T 15 -s 0:8 (-N 9) (-r) (-c) (-p u) | 2007/08/31-02:19:56 | INFO | 7308 | v1.2.8 | /dev/sde | Starting pass | 2007/08/31-02:20:11 | STAT | 7308 | v1.2.8 | /dev/sde | 348639232 bytes read in 85117 transfers. | 2007/08/31-02:20:11 | STAT | 7308 | v1.2.8 | /dev/sde | Read throughput: 24902802.3B/s (23.75MB/s), IOPS 6079.8/s. | 2007/08/31-02:20:11 | STAT | 7308 | v1.2.8 | /dev/sde | Read Time: 14 seconds (0h0m14s) | 2007/08/31-02:20:11 | STAT | 7308 | v1.2.8 | /dev/sde | Total bytes read in 85117 transfers: 348639232 | 2007/08/31-02:20:11 | STAT | 7308 | v1.2.8 | /dev/sde | Total read throughput: 24902802.3B/s (23.75MB/s), IOPS 6079.8/s. | 2007/08/31-02:20:11 | STAT | 7308 | v1.2.8 | /dev/sde | Total Read Time: 14 seconds (0d0h0m14s) | 2007/08/31-02:20:11 | STAT | 7308 | v1.2.8 | /dev/sde | Total overall runtime: 15 seconds (0d0h0m15s) | 2007/08/31-02:20:11 | END | 7308 | v1.2.8 | /dev/sde | Test Done (Passed) root@storage2 ~ #
great, we're up to 6079 IOPS. reading blocks that should come right from the 3ware cache every time, or at worst, the buffers on the spindle.
Lets compare with Areca's competing card...
root@storage1 ~ # disktest -B 4k -I BD -K 1 -p l -P A -T 15 -s 0:8 /dev/sde | 2007/08/31-02:22:16 | START | 10788 | v1.2.8 | /dev/sde | Start args: -B 4k -I BD -K 1 -p l -P A -T 15 -s 0:8 (-N 9) (-r) (-c) (-p u) | 2007/08/31-02:22:16 | INFO | 10788 | v1.2.8 | /dev/sde | Starting pass | 2007/08/31-02:22:31 | STAT | 10788 | v1.2.8 | /dev/sde | 2737975296 bytes read in 668451 transfers. | 2007/08/31-02:22:31 | STAT | 10788 | v1.2.8 | /dev/sde | Read throughput: 182531686.4B/s (174.08MB/s), IOPS 44563.4/s. | 2007/08/31-02:22:31 | STAT | 10788 | v1.2.8 | /dev/sde | Read Time: 15 seconds (0h0m15s) | 2007/08/31-02:22:31 | STAT | 10788 | v1.2.8 | /dev/sde | Total bytes read in 668451 transfers: 2737975296 | 2007/08/31-02:22:31 | STAT | 10788 | v1.2.8 | /dev/sde | Total read throughput: 182531686.4B/s (174.08MB/s), IOPS 44563.4/s. | 2007/08/31-02:22:31 | STAT | 10788 | v1.2.8 | /dev/sde | Total Read Time: 15 seconds (0d0h0m15s) | 2007/08/31-02:22:31 | STAT | 10788 | v1.2.8 | /dev/sde | Total overall runtime: 15 seconds (0d0h0m15s) | 2007/08/31-02:22:31 | END | 10788 | v1.2.8 | /dev/sde | Test Done (Passed) root@storage1 ~ #
OK now we have some low latency. 44,563 IOPS reading the same 4K block.
what about the full volume (or at least which fits in disktest's limits)
root@storage1 ~ # disktest -B 4k -I BD -K 1 -p l -P A -T 15 -s 0:1000000000 /dev/sde | 2007/08/31-02:23:33 | START | 10798 | v1.2.8 | /dev/sde | Start args: -B 4k -I BD -K 1 -p l -P A -T 15 -s 0:1000000000 (-N 1000000001) (-r) (-c) (-p u) | 2007/08/31-02:23:33 | INFO | 10798 | v1.2.8 | /dev/sde | Starting pass | 2007/08/31-02:23:48 | STAT | 10798 | v1.2.8 | /dev/sde | 2457137152 bytes read in 599887 transfers. | 2007/08/31-02:23:48 | STAT | 10798 | v1.2.8 | /dev/sde | Read throughput: 175509796.6B/s (167.38MB/s), IOPS 42849.1/s. | 2007/08/31-02:23:48 | STAT | 10798 | v1.2.8 | /dev/sde | Read Time: 14 seconds (0h0m14s) | 2007/08/31-02:23:48 | STAT | 10798 | v1.2.8 | /dev/sde | Total bytes read in 599887 transfers: 2457137152 | 2007/08/31-02:23:48 | STAT | 10798 | v1.2.8 | /dev/sde | Total read throughput: 175509796.6B/s (167.38MB/s), IOPS 42849.1/s. | 2007/08/31-02:23:48 | STAT | 10798 | v1.2.8 | /dev/sde | Total Read Time: 14 seconds (0d0h0m14s) | 2007/08/31-02:23:48 | STAT | 10798 | v1.2.8 | /dev/sde | Total overall runtime: 15 seconds (0d0h0m15s) | 2007/08/31-02:23:48 | END | 10798 | v1.2.8 | /dev/sde | Test Done (Passed) root@storage1 ~ #
So what is up, 3ware? The competing card is faster by an order of magnitude, and the advise you offer for improving performance is to increase linux readahead value?
A colleague of mine who has agreed with me for years that "something was up" has always felt that 3ware on Windows does not have such problems, so that is another subject up for testing.



MultiQuote
Sign In
Register
Help