this is my first post here so thank you for reading! :-)
We are using a software defined SAN (SanSymphony) an have some serious performance-problem running our processes. Now its on me to tell if the SAN ist the bottleneck. Unfortunately im a Database-Guy and not a SAN-Dude.
Can someone tell me some best-practices how to tell if a SAN is running hot?
Some 15k Disk-Raids on lower Tier, 2 SSD-Raid 5 for first tier (about max 15k IOPS Random Write for 8k Blocks each), connected via 8GPS Fibrechannel to the hosts. All put together with SanSymphony.
All Servers an services are running virtualized on vmware.
What i can tell:
Disk latencys (shown in Perfmon on the servers) are pretty high! 2 to 20ms Average on low load-dutys, up to 100-400ms on heavy load are typical for our MS SQL-Server. IMHO these numbers are horrific for physical Database-Servers but some consultants told us that this high latencys are normal for virtualization + san. So we tried to ignore the latencys.
I grap some logs of our SAN and figured out the most frequently Workload-Profiles (% Read, %Write, Average Read-Blocksize, Average Write-Blocksize) and setup an IOMeter-Scenario reflecting these workloads. I fired this IOMeter-Setup to our SAN in the off-duty-time, measured the maximum IOPS per Workload-Profile and compared it to the IOPS happening in real world for the specific Workloadscenario.
I put all these numbers together in some Excel-sheets and now ... i dont know any further.
IOPS_rel.jpg is showing 2 days in our companys live. Each datapoint represents about 30 minutes. I named the maximal benchmarkt IOPS 100% and compaired the real world iops with them.
What i can see:
- our SAN is continously running at about 30% maxload.
- the to peaks (1 to 13 and 51 to 59) shown in the diagramm are the processes causing trouble. The first spike hits the 100%-mark (the 120% i would tell benchmark-tollerance...) the second one is not touching the 80% -line.
So ... shall we upgrade our SAN or not? I know that this decision is at the last step a comparison of money.
But what would you say from a technical point of view?
Tank you very mutch for reading!