Sign in to follow this  
Followers 0
Simple

Raid for 2.5" NVME drives

14 posts in this topic

Specing out a new server for a client who has requested 2.5" NVME drives from Intel. I'm struggling to understand how fault tolerance is achieved without a traditional RAID controller. I get the impression that I'm missing fundamental to this technology. Can anyone shed some light?

Share this post


Link to post
Share on other sites

As someone who watches for negative reviews of storage devices,

I have still not seen any reliable data on failure rates of Intel's 750 NVMe SSDs.

Also, we had to post a WANT AD for a PCIe 3.0 NVMe RAID controller

because there just aren't any at the moment:

http://supremelaw.org/systems/nvme/want.ad.htm

4-port.fan-out.cabling.topology.JPG

One key factor to consider is the raw upstream bandwidth

of Intel's DMI 3.0 link: x4 lanes @ 8 GHz = 32 Gb/second

using the 128b/130b "jumbo frame" in the PCIe 3.0 spec.

computing exactly, x4 @ 8 GHz / 8.125 bits per byte = 3.94 GB/second

Thus, a single M.2 NVMe SSD has the exact same raw bandwidth,

so there is really no increase in bandwidth by going with

2 or more M.2 SSDs in a RAID-0 array.

Some Z170 motherboards have spaced the PCIe slots

to accommodate 3 x M.2, but everyone who has tried

to configure a RAID-0 array with such a setup

ends up hitting the DMI 3.0 bandwidth limit.

The only way to circumvent the DMI 3.0 bandwidth limit

is to install an add-in RAID controller in an x16 slot

connected directly to the CPU.

Ideally, a PCIe 3.0 NVMe RAID controller with an x16 edge connector

neatly maps into four U.2 ports, each of which has x4 lanes @ 8 GHz

and jumbo frames.

We believe it would be great if workstation-class motherboards

ditched SATA-Express entirely and used the freed-up real estate

to add three more U.2 ports, like this:

http://supremelaw.org/systems/nvme/4xU.2.and.SATA-E.jpg

4xU.2.and.SATA-E.jpg

Another option is to go with a modern PCIe 3.0 SAS RAID controller,

and install such a controller in an x16 slot controlled directly by the CPU.

These controllers are quite mature, technically speaking.

Sooner or later, multiple U.2 ports will be integrated onto future motherboards,

with native support for all modern RAID modes.

It's anybody's guess just how long we'll need to wait for this desirable evolution

in SSD technology.

Hope this helps.

MRFS

Edited by MRFS

Share this post


Link to post
Share on other sites

Here's another one like that: http://www.guavasystems.com/en/productDetail.asp?id=43

tomsitpro describes how a PEX chip is used in the FSA 200 array: http://www.tomsitpro.com/articles/one-stop-systems-fsa-200-review,2-31-5.html (software raid) :

The Switch-based Cable Adapters do not require drivers and operate in a simple bus pass-through mode. The card appears in the Windows device manager as a Base System Device, or as a PLX PCI bridge when using LSPCI in Linux.

RocketStor 3830A also looks similar, but at least advertises RAID, though nothing beyond this promo: http://www.highpoint-tech.com/USA_new/nabshow2016.htm

PEX chip power usage ranges from 3.5 to 23.9 Watts (typical): http://docs.avagotech.com/docs-and-downloads/docs-and-downloads/Avago-PLX-ExpressFabric-PB-AV00-0327EN.pdf

Share this post


Link to post
Share on other sites

Your link to Guava Systems says it correctly here:

http://www.guavasystems.com/en/productDetail.asp?id=42

PCIe Gen3 x16 for upstream port
PCIe Gen3 x4 for four downstream ports

Thus, 4 @ x4 = x16

I did my best to get more information from Highpoint,

but they were not ready to disclose any more details,

nor any photos. The following paragraph is all I could

find at their website:

RocketStor 3830A 3x PCIe 3.0 x4 NVMe and 8x SAS/SATA PCIe 3.0 x16 lane controller;

supports NVMe RAID solution packages for Window and Linux storage platforms.

Share this post


Link to post
Share on other sites

your link to Tom's IT Pro says this about the FSA 200:

The hosts only support a maximum of two PCIe Gen 3.0 x16 adapters per node

— or 32 lanes of PCIe 3.0 per host.

... the configuration still only employs a maximum of two PCIe adapters per host

(which means the same PCIe 3.0 bandwidth limitations apply).

It's too bad that we can't read more details specific

to those x16 adapters e.g. photos too.

Share this post


Link to post
Share on other sites

found some photos here:

http://www.tomsitpro.com/articles/one-stop-systems-fsa-200-review,2-31-4.html

scroll down to the photo gallery just above this next paragraph:

The One Stop Systems PCIe x16 Gen 3 Switch-based Cable Adapters operate at up to 128Gb/s of PCIe 3.0 speed in a slim HHHL (Half-Height Half-Length) form factor. The switch-based board draws a miserly 17W of power under full load and slots into a standard x16 PCIe slot, though it will support x4 and x8 connections as well. The card supports PCIe 3.0 x16 cables and employs a 32-lane PCIe 3.0 PLX PEX8733 switch.

Edited by MRFS

Share this post


Link to post
Share on other sites
On 5/2/2016 at 3:13 PM, Simple said:

Specing out a new server for a client who has requested 2.5" NVME drives from Intel. I'm struggling to understand how fault tolerance is achieved without a traditional RAID controller. I get the impression that I'm missing fundamental to this technology. Can anyone shed some light?

Hi Simple,

I came across this post while doing some research on a DB project we are working on where we are planning on using Samsung PM1725 3.2TB NVMe flash drives.  The issue right now is that NVMe flash performance has outpaced standard controllers by a wide margin.  Most traditional RAID controllers, like the highest end PRECs on Dells Power Edge server line will max out throughput and iops at 6-8 regular SATA SSDs, nevermind NVMe.

In the context of OLTP Databases (MS SQL), which is what my project is for, people are essentially passing these devices to the OS as JBOD and then relying on software layers for data consistency.  Part of that is software raid, but not what you are thinking.  

High performance flash like this is very expensive, so ideally you don't want to waste space in a RAID config.  For MS SQL, you install the OS on SSDs like the Intel 3610 for instance and have those configured in a RAID 1 through a traditional hardware raid controller like a Perc H730P.  Then you have 4 NVMe drives presented to the OS, you use the OS software RAID them in a RAID 0, or keep them as seperate devices, or you can have 2 x 2disk RAID0s (assuming 4 x nvme devices).  So you are using software raid, but not worrying about redundancy.  Then you are using a product like SIOS sanless cluster http://us.sios.com/products/what-is-a-sanless-cluster/ (there are other products as well), to handle data redundancy at the block level.  The software will synchronously replicate changes between the NVMe devices on multiple nodes and present that to the OS as a shared storage device.  So if a drive failure occurs, and takes out 2 of the drives (due to them being configured in RAID0), as far as your SQL cluster is concerned, doesnt matter, the software can just get the blocks from a different drive on the other node.  

On top of that you are probably using something like SQL Server availability groups for additional failover/safety.  

Sorry if any of the above wasn't clear, let me know if you have any questions.  I think once we see some NVMe raid controllers with proper performance this paradigm might shift, also when prices on the drives come down, right now a 3.2 TB Samsung PM1725 will set you back between 4-5k.  

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0