Small block IO is faster on the 9211 in all benchmarks I've seen. In fact, you may have done quite a few of those benchmarks (over at XtremeSystems if I recall correctly) . The highest STR's I've seen are also from the 9211, in direct comparisons with the 9260. Those are with 16 Intel SLC SSDs --no question the controllers are the bottlenecks.
Access time is also lower on the 9211, which for low queue depth situations, specifically real-world desktop use is one of the most important fundamental metrics to look at when evaluating SSD performance.
I haven't seen benchmarks with FastPath enabled on the 9260 though, and I don't claim to have seen every benchmark on the net. I'm just reporting what I've seen in the research I've recently performed.
It's worth mentioning most desktop applications don't do multithreaded IO and don't pipeline IO. They wait to receive data before issuing more requests and they do this in a single thread. This means turning around those requests with as little latency as possible is the only way to push data back to the application faster.
If you have multiple applications issuing simultaneous requests, the OS can take advantage of command queuing, but otherwise the queue depth will stay at ~1. In my use Lightroom behaves this way. Even though its processing multiple files in multiple threads, it's using a single thread for IO.
If you want to improve the IO performance of applications like this you need the lowest latencies possible. The 9211 will be better than the 9260 in these situations.
Desktop applications were written this way for performance reasons: to avoid thrashing mechanical disks. It limits SSD performance significantly though. Until SSDs are ubiquitous, I don't think we'll see any change.
Write amplification in RAID 0 comes from the simple fact that you're dividing a write that may have fit into fewer erase blocks on a single drive into at least one erase block per each disk.
There's an interaction between stripe size and SSD erase block size here (which is itself an interaction between channels and NAND page size), but every person using RAID 0 I've observed optimizes stripe size for performance, not write amplification. To avoid this problem you need to do two things:
1) Your stripe size needs to be equal to, or larger, than the SSD's block erase size, and,
2) If larger, your stripe size needs to be an exact multiple of the block erase size.
Let's say you have an 8 drive array of SandForce SSDs (lucky you...). You have a 64KB stripe size, which is probably the most common size, since it's, 1) a common default, and 2), it also often delivers optimum performance in some very popular benchmarks (ATTO, I'm looking at you). SandForce's block erase size is 512KB. You want to write a 2MB, 2048KB file to your array. This file will be divided into 32 stripes, 4 per drive.
On a single SSD, this would have resulted in a minimum of 4 read/erase/write operations, and a maximum which depends on filesytem alignment issues and the internal fragmentation of the SSD (which can be reduced by garbage collection because it understands the OSes filesystem AND hopefully has TRIM). Now it results in 8 read/write erase operations, and maybe, depending on internal fragmentation, many more. We can be certain that internal fragmentation will be higher in the RAID case because: 1) we lack TRIM, 2) we lack a filesystem table for the drive's internal garbage collection to analyze.
In this case you have a minimum of 2x write amplification, before you deal with internal fragmentation which is going to be magnified due to the absence of TRIM and a filesystem table!
Maybe overall each SSD gets less data written to it (but only before write amplification), but that's just a symptom of a higher capacity-to-data-written ratio: a single 8x larger capacity SSD would be the only way to hit your ideal 1/8 write amplification (i.e. a 400GB SSD vs 8x 50GB in RAID to continue from the example above - the 400GB SSD would last a minimum of twice as long, writing 2 MB files). It's important to remember that when we talk about write amplification in relation to SSD lifetimes, we need to consider the data written vs. the capacity (including spare capacity).
That said, RAID 0 isn't as bad as striping with parity where your write amplification will go through the roof as you update small bits of parity all over the place. Striped parity is a disaster with SSDs. RAID 3 would be the way to go, but it's unusual to see these days.
Sorry for the long post. Complicated stuff...