11 posts in this topic

Hi guys, yet another lost soul in the pits of RAID, however I've been looking for precise numbers rather than notions on the internet but it's hopeless, here's my case

environment : 5 PCs, all running Windows and all 10Gbe network

1 for backing up the projected RAID server, and the other 4 as workstations, of which only up to 2 will be accessing the sever at a time, however traffic is gonna be around 1.5-3 TB per day and 1 day can't get on top of another, or I'll be in trouble

ok, HBA's are the hardware, already bought, in shipping now, 2X LSI 9300 8i's, those are gonna be living in an LGA 2011-3 board, 1 having a RAID 10 pool and the other a RAID 6 pool, both pools made of 8X WD 8TB Reds each

note that the server will not have any VMs or any app running for that matter, other than serving those workstation, and FAST, the file sizes range from multiple hundreds of Kb sized files to many files in the 70-90 Gb each, so it's all over the place

also note that since we own a Windows server 2012 license, we'll be using that

given the ammount of data going through, no SSD caching will be implemented, hot data tracking being completely useless in this case

 

and now my actual question, since those 2 HBAs are gonna be linking the HDDs directly to the CPU in a way, and there are gonna be 2 raid pools of 8 HDDs each, 1 RAID 6 and 1 RAID 10, and all the calculations are gonna happen on CPU, which has nothing else to do, Riddle me this : HOW MUCH IN NUMBERS is this type of setup gonna use CPU say for RAID 6, PLUS dealing with 2 aggregated 10 Gbe links ?

 

all I can find on the internet is "an ROC will offload the parity calculations from the server CPU" and NOWHEWRE does it say HOW MUCH IS THAT IN NUMBERS ??? NOT ONE REAL WORLD EXAMPLE or ANY theoretical system, please note that I can't sell the HBAs and get ROCs, this is not my choice however I'd like to complete this project the way it's going ... basically my question is will a 5820 do ? or will I have to go E5-2699 V4 so to speak, or actual ... as far as ram is concerned, I know this is Windows so depending on the CPU chosen, probably as much memory as it supports will be installed

the CPU is not gonna be doing anything, so if this setup will be using 12 or 24 or 72% of this or that CPU, one can have an accurate prediction of how things will go, but installing a CPU and seeing constant 95%+ use while writing files isn't optimal ...

 

I'm sorry but nowhere am I seeing something like, "I have windows server 2012 running on [this] sever, and my HBA connected 8 drive raid6 is using XX% of this or that CPU", NO numbers to be able to begin to figure this out

Edited by cscrek

Share this post


Link to post
Share on other sites

I  just came across your excellent question, but

I also do not have any hard empirical numbers to share with you.

Possibly, one theory I maintain for the general absence of such numbers

is the reality of quad-core CPUs that are rarely running flat-out at 100% utilization.

The latter means, quite bluntly, that a high-speed idle core is quite capable

of doing RAID parity calculations.  Also, our experience has been almost

entirely with RAID-0 arrays -- for speed -- and we have opted for less expensive

RAID controllers that shift some of the overhead to those idle CPU cores.

To illustrate, even an LGA775 CPU like Intel's Q9550 has such a large

amount of on-chip Level 2 cache (12MB), that cache is capable of operating at speeds

equal to, or faster than, the speed of dedicated RAID chips mounted

on Add-On Cards ("AOC").  You could begin your own research by

assembling a Windows software RAID on a more modern multi-core CPU,

and monitor standardized tests with Windows Task Manager.  For RAID-0 arrays,

in particular, another factor to consider is the cumulative amount of SSD cache

that results from a RAID-0 array e.g. 4 x Samsung 750 EVO @ 256MB cache

effectively equals 1GB of SSD cache.  With such a setup, all that a cheap

RAID AOC needs to do is the PLX-type switching between the edge connector

and 4 such RAID members:  the rest of the overhead can be handled easily

by idle CPU cores and the controllers embedded in each member SSD.

Bottom line:  at least for RAID-0 arrays of fast SATA-III SSDs, that question

is a little moot.  For other modern RAID modes, your question still awaits

comparative empirical analyses.  For M.2 RAID arrays, it is now well known

that the limiting factor is the max upstream bandwidth of Intel's DMI 3.0 link.

One observation I noted yesterday was a measurement which showed a RAID-0

of 2 x Samsung 960 Pro running at 90% of its maximum theoretical throughput

e.g. 32 Gbps / 8.125 bits per byte x 0.90   ~=  3,544 MB/second !!!

An aggregate controller overhead less than 10% is quite extraordinary

for 2 x M.2 SSD.  However, we won't know how much CPU overhead

is required of RAID-5 and RAID-6 parity calculations until motherboards start supporting

4 x M.2 -or- 4 x U.2 ports that are NOT downstream of Intel's DMI 3.0 link.

Edited by MRFS

Share this post


Link to post
Share on other sites

Here's what I calculate for the Samsung 960 Pro M.2,

using the 2 READ measurements in the charts above:

960 PRO RAID [NVMe]  1.0 - (3,579 / 3,938.4)  =  9.1% overhead

960 PRO JBOD [NVMe]  1.0 - (3,355 / 3,938.4)  = 14.8% overhead

I submit to you that the 960 Pro is SO FAST, it's already bumping

against the max upstream bandwidth of Intel's DMI 3.0 link.

I'll venture to predict that the RAID measurement will be much higher,

if those 960 Pro SSDs are mounted instead in 2.5" adapters like this one:

http://www.sybausa.com/index.php?route=product/product&product_id=884&search=SY-ADA40112

and wired to an NVMe RAID controller with x16 edge connector,

like Highpoint's RocketRAID model 3840A:

http://highpoint-tech.com/PDF/RR3800/RocketRAID_3840A_PR_16_08_09.pdf

http://supremelaw.org/systems/nvme/RocketRAID.3840A.NVMe.RAID.Controller.2.jpg

http://supremelaw.org/systems/nvme/RocketRAID.3840A.NVMe.RAID.Controller.1.jpg

Edited by MRFS

Share this post


Link to post
Share on other sites

Here's my prediction, using the 950 scaling factor (from JBOD to 2-member RAID-0):

950 RAID / 950 JBOD  =  3,255 / 2,229  =  1.460

960 JBOD = 3,355 x 1.460  =  ~4,900 MB/sec (faster than DMI 3.0's max headroom of 3,938.4 MB/sec)

The raw upstream bandwidth of a single PCIe 3.0 x16 edge connector is:

x16 x 8 GHz / 8.125 bits per byte  =  15,753.8 MB/sec

Highpoint computed 15,760, so we're VERY close!

http://highpoint-tech.com/PDF/RR3800/RocketRAID_3840A_PR_16_08_09.pdf

Edited by MRFS

Share this post


Link to post
Share on other sites

wow thanks for sharing, a lot of this info will certainly come in handy, I'll report here when I have something of my own as well, but thanks again, this does give me a certain insight onto my problem :), a general direction if u wish, most appreciated

Share this post


Link to post
Share on other sites

Forgive me for my obvious preference for Highpoint stuff:  I've had much success with their RocketRAID 2720SGL controller, and that's the reason for my bias.

FYI:  when you step up to PCIe 3.0 RAID controllers, there are a multitude of choices with x8 edge connectors, and lots of price variations e.g. search Newegg for "RAID controller" and then refine your search further.

Here are the latest PCIe 3.0 RAID controllers with x8 edge connectors from Highpoint:

HighPoint RocketRAID 3740A 12Gb/s PCIe 3.0 x8 SAS/SATA RAID Host Bus Adapter
http://www.newegg.com/Product/Product.aspx?Item=N82E16816115206&Tpk=N82E16816115206

HighPoint RocketRAID 840A PCIe 3.0 x8 6Gb/s SATA RAID Host Bus Adapter
http://www.newegg.com/Product/Product.aspx?Item=N82E16816115205&Tpk=N82E16816115205

The main problem with SAS SSDs, of course, is their much higher unit costs which appear to be out of proportion to the mere increase in data transmission speed (12Gb/s).

Even so, both of the latter 2 Highpoint controllers support RAID-5 and RAID-6, as do the more expensive competitors e.g. Areca, LSI, Adaptec, Avago etc.

We are frankly waiting to see what AMD's Zen CPUs and AM4 chipsets have to offer: 

I've written to AMD's CEO recently to recommend that AMD might OEM Highpoint's model 3840A NVMe RAID controller, particularly if the AM4 chipset does not support all modern RAID modes with 4 x M.2 or 4 x U.2 ports integrated into AMD's AM4 chipsets.

So, we are playing a wait-and-see game here, at the moment  :)

Hope this helps.

Edited by MRFS

Share this post


Link to post
Share on other sites

And, there really are not very many RAID controllers with x16 edge connectors:  it appears that RAID controller vendors have been slow to catch up with the x16 edge connectors widely available with single and multiple video cards (SLI and Crossfire).

(Honestly, I've been pounding on this topic for at least 10 years!  :)

Here's a PCIe 2.0 RAID controller with x16 edge connector from Highpoint:

http://www.newegg.com/Product/Product.aspx?Item=9SIA6KX4455578&Tpk=9SIA6KX4455578

(RAID-5 but no RAID-6, according to Newegg's specs)

Remember that each PCIe 2.0 lane oscillates at 5 GHz with the 8b/10b legacy frame:  i.e. 500 MB/sec. per x1 PCIe 2.0 lane.

Thus, what you lose with the slower lane speed, you make up with a larger number of cheap SSD storage devices, and an obviously large upstream bandwidth.

MAX HEADROOM across that x16 edge connector is x16 @ 500 MB/sec  =  8 GB/second

(PCIe 2.0 legacy frame is 10 bits per byte:  1 start bit + 8 data bits + 1 stop bit)

An x8 PCIe 3.0 edge connector has just about the same upstream bandwidth:  x8 @ 8 GHz / 8.125 = 7.87 GB/s

(the PCIe 3.0 jumbo frame is 130 bits / 16 bytes  =  8.125 bits per byte)

 

Share this post


Link to post
Share on other sites

For the sake of comparisons, here are 2 controlled measurements

of a RocketRAID 2720SGL controller with x8 edge connector, PCIe 2.0 chipset,

driving 4 x Samsung SATA-III SSDs at 6G, RAID-0:

upstream bandwidth is x8 @ 5 GHz / 10 bits per byte  =  4.0 GB/s

theoretical max headroom is 4 x SSD @ 600 MB/s  =  2.4 GB/s

MAX READ speed (measured)  =  1.846 GB/s

The following ATTO graphs are THE BEST I've seen with this RAID-0 configuration:

1.0 - (1,846 / 2,400)  =  23.0% aggregate overhead with 4 x Samsung 840 Pro SSDs:

http://supremelaw.org/systems/io.tests/4xSamsung.840.Pro.SSD.RR2720.P5Q.Deluxe.Direct.IO.2.jpg

 

Here a comparable measurement using a very similar motherboard, 4 x Samsung 850 Pro SSDs:

http://supremelaw.org/systems/io.tests/4xSamsung.850.Pro.SSD.RR2720.P5Q.Premium.Direct.IO.1.jpg

 

Thus, it should be very interesting when an NVMe RAID controller,

properly installed in a compatible x16 PCIe 3.0 expansion slot,

can drive 4 x Samsung 960 Pro M.2 SSDs in RAID-0 mode!

There is an engineering elegance that obtains with the following:

4 @ x4 = x16

x16 PCIe 3.0 lanes  = 4 x M.2 SSDs @ x4 PCIe 3.0 lanes

Share this post


Link to post
Share on other sites

Here's one more ATTO measurement of 4 x Samsung 840 Pro SSDs in RAID-0,

same chipset as above:

1.0  - (1,879 / 2,400)  =  21.7% aggregate overhead

http://supremelaw.org/systems/io.tests/4xSamsung.840.Pro.SSD.RR2720.P5Q.Premium.Direct.IO.2.jpg

Share this post


Link to post
Share on other sites

Here's the "method" we've been applying with the above:

try to maintain apples-to-apples comparisons

of 4 x data channels i.e.

4 x SATA-III devices in RAID-0

4 x PCIe 3.0 channels in the DMI 3.0 link

4 x PCIe 3.0 lanes in a single NVMe M.2 device

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now