Sign in to follow this  
CougTek

Hyper-V Cluster Storage Revamp

Recommended Posts

We currently run a 3-node Hyper-V 2012 R2 cluster using an HP 3Par 7200 for storage.  The 3Par is now out of warranty (which is too expensive to renew) and 90% full.  Also, some of the nodes show memory spikes usage over 80%, so they'll have to be replaced soon too, even though they're still under their original 5-year warranty.  The nodes have 256GB of RAM each and dual 10-core Xeon.

We have twenty VMs deserving around 350 users.  Among the VMs, there's one fatty MS Exchange server and three SQL 2014 servers, two of those being quite busy.  The current 3Par 7200 (capable of ~8000 iops according to IOMeter) sometimes chokes under load, if I trust the Veeam One alerts I receive.  Our data grows by over 30% per year and the VMs need 7TB today.  We're looking for an upgrade that will last five years, without having to pour additional money before 2022.

HPE's guys want us to get another 3Par (8200 with all-flash storage).  I'd rather take another path.  I read a lot about SDS and Windows 2016's Storage Space Direct looks quite promising.  Also, Datacore SAN Symphony draws a lot of attention.  SDS must also be simpler to manage than a proprietary system like a 3Par.  Since we plan to upgrade our core switchs, PFC/DCB for RoCE support on the switch side isn't a problem; the model we plan to get has it.

Nutanix wants to propose us a solution.  I meet with one of their representatives tomorrow.  An Hyper-converged solution sounds nice, although the horror story I've read here dating back to mid-2015 isn't flattering for Nutanix.

 

Thoughts?

Share this post


Link to post
Share on other sites

Personally a Nimble array is perfect.  It's performant, simple, and now an HPE company, so you can just ask them to engage the Nimble folks, since you have an existing relationship with HPE.  3Par is way more than you need IMO

Otherwise, Nutanix only makes sense if you want to get rid of your existing compute nodes.  But at the same time, there are some Hyper-V specific HCI players out there (I can't remember what those names are)... Something you may want to look at, rather than Nutanix with HyperV which is a hypervisor with another hypervisor on top of that.

Of course there is the DellEMC Unity arrays too, which has a nice HyperV plugin for mgmt and support as part of the Unity UI.  And I think Veeam can use the Unity snapshots for backups too.

Share this post


Link to post
Share on other sites

Your needs really aren't all that complex, I assume the 3PAR is a disk array given its vintage. Almost anything with flash will serve your needs. I wonder if the host servers get better if you have better storage too, that RAM footprint is impressive. Anyway, you have a lot of choice. Can you tell us any more about your budget and technical needs?

Share this post


Link to post
Share on other sites

Sorry for the hiatus; I've been quite busy.

Re: Mitch

We've been contacted by and received a proposal from Nimble in January.  It looked good on paper, but replacing only the SAN doesn't fix our node resources problem (not really a problem now, but will be sooner than later).  Also, regardless of the company, if we only upgrade the SAN, then we'll be corned in another "nodes+SAN" architecture.  I'd really like the management simplicity of an hyper-converged architecture.

If we go with Nutanix, we'll convert the Hyper-V VM to Acropolis (their hypervisor). 

 

Re: Brian

Yes, the 3PAR is a disk array.  Our needs  are to have a robust architecture with enough resources to support the production environment for several years and reliable replication to a DR site.  We already have a DR setup, but Veeam replication leaves a lot to desire.  It's been unreliable in our environment.  Not something new.  We've used Veeam since version 7 (which was crap for Hyper-V).  Version 8 worked better, but version 9 and 9.5 fail to take snapshots from 3 of the VMs.  We call the support, it gets fixed and a few months of Windows updates later, it breaks again.  Overall, Veeam simply hasn't been dependable for us.  Veeam also doesn't work on Nutanix's Acropolis.

The RAM issue can be fixed easily if I manually balance the VMs on the host to balance the load, but an Hyper-V failover cluster doesn't efficiently distribute the VMs on the hosts when one host goes down.  So if we keep using Hyper-V, we'll need to upgrade the nodes to ensure that we have a lot of spare resources on each host.  According to the Nutanix talking heads, their cluster does a much better and simpler job of distributing the load.  They demoed it numerous times too, but of course, the salesmen always show the shining parts.

I've not received the prices yet, but if the offers have similar cost, Nutanix's architecture looks quite good.  I'd really like to find out what you found to perform poorly two years ago.  I understand that you cannot disclose it due to the agreement you've had with them.  Depending on what doesn't work well on their solution, it might or might not affect us for our use.  So maybe it's a non-issue in our case.

Comparing Nutanix to a Windows Hyper-V cluster and Storage Space Direct volume, Nutanix has the advantage of data locality on the nodes.  S2D doesn't apparently try to move the most used data on the node that uses it, so that's why it's a lot more demanding on the networking side (which means $$ for the switches).  The nodes also all have to be the same, so no mix-generation nodes within the cluster, which isn't the case with Nutanix.  However, with S2D, it's more of a DIY architecture, so there's more hardware choices than what Nutanix offers for their nodes.  It also possible to use more generic component, bringing the cost down.  The downside of this is multi-vendor support, so they can all throw the ball to each other when issues arises.

I've not considered Dell or HPE's HC380 yet and I don't think I will either.  Dell's support could be better around here and HPE's hyper-converged solution isn't what HPE's guys want to sell us, which means they won't give us a good discount for it.

Regarding the budget, it's in the low six-figures (~150KU$).

Share this post


Link to post
Share on other sites

On SSD, before taking a jump in that direction, please research some of the corner-case support issues some have encountered with it:

https://www.reddit.com/r/sysadmin/comments/609e98/another_catastrophic_failure_on_our_windows/

I've personally heard fantastic things about Storage Spaces Direct in regards to performance, but I think there are support issues that still need to be honed out. You just don't encounter the same scale of problems with other solutions out of the gate.

On the all-flash side, I've been incredibly impressed with the NetApp AFF-series. I'm playing around with the A200 right now and its been awesome. Performance hit from inline compression and dedupe isn't that bad either.

Share this post


Link to post
Share on other sites
On 3/17/2017 at 8:46 PM, CougTek said:

I've not received the prices yet, but if the offers have similar cost, Nutanix's architecture looks quite good.  I'd really like to find out what you found to perform poorly two years ago.  I understand that you cannot disclose it due to the agreement you've had with them.  Depending on what doesn't work well on their solution, it might or might not affect us for our use.  So maybe it's a non-issue in our case.

We weren't impacted by the EULA, just their "business practices." That said, Nutanix offers a great platform if you don't want to be on VMware. Outside of that use case, there are better options. 

Quote

Regarding the budget, it's in the low six-figures (~150KU$).

With that kind of budget you have a lot of options. The A200 that Kevin is loving is more compelling than we expected. Dell EMC has choices, we're also about to start playing with Nexenta's latest offering which should have a better cost profile if that's of utmost importance. But really, the A200 has all the platform maturity you could ask for and the DR services you want. Once you get new storage in, you could contemplate upgrading your server nodes. As Kevin says though, I'd not rely on MSFT for storage these days. 

Lastly, I shared this link with Veeam. We have good friends there and I'm sure they'd like the opportunity to address your issues.

Share this post


Link to post
Share on other sites

Thank you Kevin for the Reddit warning story.  Since you both put a good word for the NetApp aff-flash SAN, I'll look into it later.  I have a lot of reading to do so I probably won't post back for a few days.

 

Thanks again for your help.

Share this post


Link to post
Share on other sites

Warning, if you move to Acropolis, your backup architechture will NEED to change significantly.  Storage mangement tools, reporting products, etc; many don't support Acropolis.  All the wonderful things they claim can be done on Acropolis are cancelled out when nothing seems to support Acropolis.  The market supports VMware first, HyperV second, and KVM and some of the other Openstack players are next.  

So regardless of choice traditional vs HCI....  Think about all the other techncial and business process solutions you have.  Will they need updating, retiring, changes of process etc.

If you're going to entertain Acropolis, why not entertain Vmware?  I mean there is an added cost, but that cost comes with much more features, maturity in DR/load balancing features, and broader industry support.  Not to mention a lot of HCI options there.  The only benefit to Acropolis is less upfront cost for the hypervisor.  Then it becomes more cost for all the other things like mgmt, workflow, etc.

Share this post


Link to post
Share on other sites

Yep, I've been reminded about Veeam's absence of support for Acropolis.  Same goes for Zerto, which I was eyeing for the replication part.

I'll stay with Hyper-V.  Regarding VMWare, not sure I want to add another 20K$ for something that more or less does the same thing than Hyper-V, but a bit better.  So far, my Hyper-V cluster has been good enough.  Could be better, it's certainly perfectible, but not worth a five figures investment for the amount of VMs I have to manage.  At least in my view.

Too bad I'm too busy to try Datacore SAN Symphony-V.  Not sure it would save us money.  Not sure it's easier to manage either.  Not even sure it plays nice with the backup/replication softwares.  But the performance numbers posted on the SPC-1 website are amazing considering the low cost of the hardware used.

Anyway, breaking benchmark records isn't the objective.  Providing a reliable, high availability platform with enough space to store users' data while being fast enough so they don't wait for it, is

Share this post


Link to post
Share on other sites

For whatever reason it escapes me the sole few Hyper-V focused HCI players out there.  Perhaps they are not surviving? Or had to diversify to support other/more platforms?  

You can always do what the big boys do and just go full on OpenStack, KVM, Docker, etc.  :P

Share this post


Link to post
Share on other sites

 

8 hours ago, mitchm3 said:

You can always do what the big boys do and just go full on OpenStack, KVM, Docker, etc.  :P

Don't send the poor guy running for the hills Mitch. 

I've long not understood why MSFT hasn't done more to either deliver for facilitate HCI for Hyper-V. I think they view Azure Stack as that solution, though I think small orgs will find that difficult or too expensive to adopt. 

Share this post


Link to post
Share on other sites
17 hours ago, mitchm3 said:

For whatever reason it escapes me the sole few Hyper-V focused HCI players out there.  Perhaps they are not surviving? Or had to diversify to support other/more platforms?

Maybe you think about Starwind Virtual SAN?

BTW, reviewing and comparing those solutions (Solarwind's Virtual SAN, Datacore SAN Symphony-V, VMWare's VSAN, Microsoft's Storage Space Direct) would be a great article and I'm sure it would draw a lot of visitors.  It would be a fantastic tool for all those looking into software define storage.

 

I've looked into a private Openstack cloud, but one of the goals of the new architecture is ease of management.  Troubleshooting Openstack issues isn't easy.  Being the sole network administrator of a ten-companies conglomerate isn't my only task.  I'm also the IT manager of all this.  I deal contracts, purchases, oversee the budget and supervise the L1-2 technicians and when they aren't capable to fix an issue,I'm the one who has to deal with it.  The amount of time I have to do my real job, which is supposed to be a network administrator, is quite limited.  I don't need something easy to deal with because I'm a moron.  I need something simple because I simply don't have the time to do deep troubleshooting.

Share this post


Link to post
Share on other sites

We're about to do another Nexenta review, it's been a while since we looked at them. As to the others:

  • Starwind - don't really know them
  • Datacore - we have tried many times for them to work with us on a review but they refuse
  • vSAN - we've done a lot here and will do more
  • MSFT - they are insanely difficult to work with. We likely will not have content here unless it's in conjunction with a partner of theirs. We're reviewing their Azure Cloud Pack through a partner now and we can't even get a Microsoft product person to take a call with us. 

Share this post


Link to post
Share on other sites

If Microsoft doesn't want to help you on a review, they sure won't help me with a small setup like what I was considering.  That's their third strike.  What pisses me off about those companies is that they charge a LOT of money for their licenses and support, but the service level they provide is abysmal.

If Datacore refuses a comparative review, it's probably because they have something to hide.  It at least tells that they aren't totally confident in their product.

Nutanix, at least during the pre-sale stage, put a lot of efforts to convince me to go for their solution.  I know that it wasn't your experience two years ago though, so I'm quite cautious with them.

Share this post


Link to post
Share on other sites

A lot has happened during the past week.  Long story short, I'll probably opt for two Nimble arrays: one at the main site and one at the backup site.  Their CS3000 (main site) and CS1000 (backup site) are simple to configure and operate.  Plus, replication between two sites is supposedly dead easy to setup and works well, so I won't need to spend for a replication software like Zerto.  Taking volume snapshots practically doesn't take space and the amount you can take on their platform is very high (they tested up to 160,000 without issue).  I'll still need Veeam, but only to do periodic full backups.

They garanty 50,000iops minimum on the CS3000 and 30,000iops on the CS1000, which is enough for our needs.  Since setup and management are so easy, I don't mind not moving to an hyper-converged architecture.  Managing the hypervisor cluster isn't what I had problems with.  Mitch had the right suggestion from the first reply of this thread.

 

 

Share this post


Link to post
Share on other sites

This whole project, plus the recent review of the Synology FS3617xs NAS, made me think about Synology's attempt to pierce into the higher-end storage segment. 

Consider two solutions I've been offered :

  • HPE 3PAR Storeserv 8200 with 8x 1.92GB SSD, ~23TB useable with compression, FC 16G links to servers, 70,000iops advertized, promo at ~55,000U$ (normally more) with 5y 24x7 support
  • Nimble CS3000 with 3x ~2TB SSD and a bunch of mechanical drives, ~25TB usable storage overall, 50,000iops advertized, less-than 60,000U$ with 5y 24x7 support

Now take the following configuration with a Synology FS3017 :

  • 24x Samsung SM863 1.92TB (MZ-7KM1T9E)
  • 2x Mellanox MCX314A-BCCT 2-port 40G QSFP+

Add the rail kit and you end up a tad over 39,000U$.  You need two units for HA, so the price climbs to ~78,000U$.  You need all those drives to end up with ~23TB usable (RAID 1+0) because they don't have deduplication or compression, at least not that I'm aware of.  I prefer no RAID 5/6 on volumes I put high-load databases on.

For a complete business proposal, with a DR site, it looks like this :

FS3017_architecture.PNG

The unit at the backup location could only have a single 2-port 10G adapter and 12x Samsung PM863 3.84TB SSDs.  Enough for running the cluster while the main site is restored.  The 3 units (two in HA at the main site and a cheaper one at the backup site) would cost ~110,000U$ for an all-flash architecture.  No idea how many iops an FS3017 with that kind of SSDs would yield compared to the higher-end solutions from Nimble and 3PAR.  The Synology also doesn't offer fast support in case of emergency, unlike Nimble, HPE, Dell, IBM, etc...

For a Nimble solution, you only need one unit at each location, since they include two controllers each.  With an HPE 3PAR, you need to add SAN switches in order to have site replication.  This increases cost and complexity.  In case of the 3PAR, complexity is part of the deal (bunch of services to configure, system reporter, service processor, dealing with direct FC links, configuring the SAN switch, managing the LUNs, etc).

Synology offers more of a brute force solution to feed iops to the cluster, while the two others appear to be more refined (bunch of ASICs in the 3PAR doing the real-time compression, completely different approach on the Nimble).  A Nimble CS3000 (main) + CS1000 (DR) cost about the same amount, but with much better support.  It isn't all-flash though.  A dual 3PAR 8200, including all the gears and licenses to do off-site replication, would cost a bit more (probably ~150,000U$)  and would be significantly more complex to setup and manage.

Synology's proposal doesn't look bad, but considering the options, who would dare to chose that to trust their entire SMB storage on?  Would you?

Edited by CougTek
spelling mistakes. Probably left a few...

Share this post


Link to post
Share on other sites

As much as I love Synology, in the price range you are talking about, you should really be looking at the NetApp AFF A200. Much, much higher real world performance than the FlashStation, which we had fully decked out in SAS3 Toshiba flash. Like night and day differences.

Share this post


Link to post
Share on other sites

Good to know about the Synology vs NetApp all-flash.  Unfortunately, none of my main suppliers try to push a NetApp to me.  I don't know what kind of support I would have.  The closest storage I'm offered to the NetApp AFF A200 is the 3PAR 8200 all-flash.  I don't know how those compare.  They are probably direct competitors to each other.  Hopefully, the 3PAR is close to the performance you've seen with the NetApp.

I meet again tomorrow with my main supplier and I'll ask about the NetApp, but I doubt he'll bother to even give me a price for it.  Is it simple to configure?

Share this post


Link to post
Share on other sites

A company like Nimble, was created to address what the big players werent doing.  Simple to setup, simple to license/all-inclusive licensing, easier maintenance, etc.  Their hybrid approach was better than the incumbants.  Their all-flash was more of a "me-too," to address the competition IMHO.  They now have I think NAS capabilities.  

Much of Nimble was founded by ex NetApp and DataDomain folk.  As such, I think NetApp is a fantastic NAS and is pretty hard to beat.  As a SAN, I think they a second rate and more of their cool stuff (data services) is on the file side and not block side. But with NetApp, you do get a true enterprise product, that has a tremendous amount of maturity over a startup like Nimble.  

If going NetApp, I'd look at running Hyper-V over SMB3.0.  Again, this is where I think NetApp shines, which is on file services and not over FC/iSCSI access.

If Nimble, FC or iSCSI is their preferred method.  So you may not need a new FC SAN, and can remove that cost, and stock with 10GbE.

Veeam supports Nimble snapshots and NetApp, if that tickles your fancy.  But truth be told, as cool as that sounds, it's an incredibly complex implementation that you'll troubleshoot more than you want or think you will.  This goes for all snapshot, lan-free, san basd backup technologies.  Man I hate that concept these days, and the grumpy unix admin's that think they need it...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this