omega015

Do i have performance issues? new iscsi san / vm enviroment

Recommended Posts

Hello All, just joined the forum as I have a couple of questions about or san and wondered if we are getting the best from it. I have tried to give as much information as possible without every possible setting (I'll wait for requests for them). in short I think I might be having throughput/iop performance issues for my VM's. not sure if its iSCSI network related or san related or VMware related. or maybe I have none and was just expecting more.

The Problem

We recently had a company in to upgrade our infrastructure (we were still server 2003). We are noticing a bit of a lag in one program. The program in question (an ERP system) is a new program for us so we are not sure if its the hardware or the software as we have no benchmarks on either.

Information

3 hosts (HP DL380p Gen8 dual 3ghz 10 core cpu, 64gb mem) with latest esxi on SD card. no disk in any host.

Cisco (C2960x) stacked switch connecting all the host for network, management, vmotion, etc..

The iSCSI network is then on 2 separate cisco switches (non stacked) again C2960x

we currently only have 6vms setup (2 DCs, 1 exchange, 1 vcenter, 1 ERP application, 1 ERP DB ) the ERP software is not used by anyone yet.

SAN is 1 HP MSA 1040 (dual controller with dual 1GB on each) I believe 2GB cache on each controller. + 1 HP D2700 enclosure.

the disks are as follows:

Name Size RAID Disk Type Current Owner Disks
vDisk01 2398.0GB RAID50 SAS A 6 (10k 600GB)
vDisk02 4196.5GB RAID5 SAS B 8 (10k 600GB)
vDisk03 3597.0GB RAID5 SAS A 7 (10k 600GB)
vDisk04 2996.8GB RAID50 SAS B 12 (15k 300GB)

The ERP is on vDisk04 (only the application and db vm are on this).

Here are some stats from CrystalDiskMark:

Results

This is from one of the erp vm's on vDisk04

Sequential Read : 108.898 MB/s

Sequential Write : 107.679 MB/s

Random Read 512KB : 102.853 MB/s

Random Write 512KB : 98.401 MB/s

Random Read 4KB (QD=1) : 8.630 MB/s [ 2107.0 IOPS]
Random Write 4KB (QD=1) : 5.251 MB/s [ 1281.9 IOPS]

Random Read 4KB (QD=32) : 98.198 MB/s [ 23974.2 IOPS]
Random Write 4KB (QD=32) : 49.038 MB/s [ 11972.3 IOPS]T

Test : 500 MB [D: 8.1% (16.3/200.0 GB)] (x5)

OS : Windows Server 2012 Server Standard Edition (full installation) [6.2 Build 9200] (x64)

running crystaldiskmark gives almost identical results from any vm (no matter the vdisk the vm is on) I would of expected slightly different results due to the raid.

we also have 1 host running server 2012 native and ran crystaldiskmark on vDisk03 and get slightly improved results. So this leads me to question the VMware settings being a slight issue possibly? also not sure if I should be seeing better throughput/iops in general or if these figures are reasonable?

This is the other result:

Sequential Read : 205.442 MB/s

Sequential Write : 189.707 MB/s

Random Read 512KB : 172.833 MB/s

Random Write 512KB : 169.600 MB/s

Random Read 4KB (QD=1) : 9.021 MB/s [ 2202.3 IOPS]

Random Write 4KB (QD=1) : 6.164 MB/s [ 1505.0 IOPS]

Random Read 4KB (QD=32) : 136.312 MB/s [ 33279.2 IOPS]

Random Write 4KB (QD=32) : 50.407 MB/s [ 12306.4 IOPS]

Test : 500 MB [E: 22.5% (752.6/3349.4 GB)] (x5)
OS : Windows Server 2012 Server Standard Edition (full installation) [6.2 Build 9200] (x64)

Hopefully some intelligent knowledgeable person on here can help me troubleshoot where any issues may be or can tweak a few settings to get some better performance (if any). or failing that I can pinpoint the issue to be the ERP software and know my hardware is all fine. I hope the limitation isn't the new infrastructure that's been implemented.

Thank you in advanced.

EDIT:

Forgot to mention I have roundrobin / MPIO setup. Thanks again.

Edited by omega015

Share this post


Link to post
Share on other sites

What virtual storage device (SCSI Controller) are you using for the older 2003 VMs and the newer S2012 VM?

Also when you look at the performance tab (advanced looking at storage) do you see any huge latency spikes during the lag sessions? CDM isn't a great benchmark in any regard, although it can be used to "kick the tires". The 32-queue depth 4K numbers on each show pretty device results for a HDD RAID5 or RAID50 array and both are in a similar range. I'd almost be willing to bet this isn't really a storage issue but just an older legacy software issue.

Share this post


Link to post
Share on other sites

Hi Kevin and Happy New Year,

The older 2003 vm's were using local storage on different servers. They haven't been migrated as they are no longer needed. The new 2012 vm's are setup on a HP MSA 1040 SAN (dual controller with dual 1GB on each).

For the performance tab I see some latency peaks of around 7miliseconds when running the application.

DataStore: Read/Write peak 7ms

Storage Adaper: Read 7ms / Write 8ms (on vmhba34)

Regarding the software this isn't legacy software this is a new software application we are implementing. Hope this helps.

Regards,

Michael

Share this post


Link to post
Share on other sites

Single digit latency really isn't a problem in terms of the storage array. If it were say 10-100x that the story would be a bit different. I think you have another problem at play. What virtual storage controller are you using?

Share this post


Link to post
Share on other sites

Sorry if I'm misunderstanding your question but this SAN is a bit new to me and still learning. The physical controller is: HP MSA 1040 2-port 1GbE iSCSI Dual Controller SFF Storage5 (E7W02A) I believe each controller has a 4gb cache.

All 4 1GbE connections are connected to 2 cisco switches, the iscsi nics on the VMware hosts are then connected to the 2 switched (1 in each). so each of the 3 hosts have a 2GbE connection to the SAN. The 3 VMware hosts are running iSCSI software adapter. the physical windows host again has 2 1GbE connections to the SAN and is running Microsoft iscsi initiator.

I have also played around with the vDisks/raid setup and there is virtually no difference in the numbers no matter the disk speed (10k/15k disks) or raid type (10, 50, 5).

Share this post


Link to post
Share on other sites

Nothing really seems out of place. Without knowing the ERP software you are using, have you googled around to see if you have found any other lag complaints? Might be an issue with the software itself.

Share this post


Link to post
Share on other sites

Thanks Kevin, the lag might be while files cache etc.. or so this is what the vendor has suggested. Although I still suspect there is still an issue with VMware as the disk throughput for the vm hosts are up to half the speed of the physical windows host.

Which makes me wonder if vmware round robin isn't doing what it should be? I also disabled 1 nic on the host and still got the same speed, so when using RR the host doesn't double the throughput.

Share this post


Link to post
Share on other sites

VM performance will always be under physical host performance, dramatically so in some cases. The other thing at play is physical access will generally have much better sequential speeds (VMware will chunk up everything making trasnfers very high sequential I/O). So even though you have multiple LAN connections, you might be topping what sequential I/O the array is capable of.

Share this post


Link to post
Share on other sites

Are you using Jumbo Frames on your iSCSI network? How do you have your Networking configured on vmware? vSphere standard switches? are you using nic teaming?

-Justin

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now