Today I attended a killer session at VMworld 2009 on getting the most performance out of your disk array with VMware. After the session, my head was spinning with all the new information and the realization that optimizing your vSphere storage can be non-trivial. If you attended VMware world, please listen to session TA2467. Understanding VMware storage concepts is critical.
The speaker covered all supported network storage protocols, including Fibre Channel, iSCSI and NFS. The primary root cause of poor performance in a virtualized environment is poor storage performance. VMware vSphere 4.0 has massive improvements in the storage stack for all protocols. However, it’s NOT a simple plug and play matter if you really want to optimize your environment. Both iSCSI and Fibre Channel require low-level knowledge about ESX and your storage array to make the most of your disk array.
Each protocol and each version of ESX have very specific requirements and storage multi-pathing implementations that you MUST understand. Many concerns are array specific, so you get with your storage vendor and read their whitepapers on best practices for VMware. VMware has their own set of whitepapers, which should be required reading prior to any deployment. You can read their Fibre Channel configuration guide here. Also, tripple check that your exact storage configuration is supported in the VMware storage compatibility guide. Many customer problems can be traced back to running in an unsupported configuration.
In vSphere 4.0 VMware added native multi-pathing, but even with this major enhancement you still need to deep dive into your array to understand how it handles LUNs, load balancing, and what you need to do to tweak settings. For example, one critical feature of your array that you must understand is what the controller architecture is. Is it active-active concurrent, active-active non-concurrent, active-passive, etc. If you only ‘know’ your array is active-active, it’s vital to know if it’s concurrent or non-concurrent. Only A/A concurrent is active-active in the eys of vSphere.
How do you tell and what’s the difference? First, ask your disk vendor if you aren’t sure. Secondly, look at how much you spent on the array. 🙂 If you mortgaged your business to buy the array and got an EMC Symmetrix or HDS USP, you have A/A concurrent controllers. Or, if you did your market research and found other storage vendors like 3PAR or Compellent, you also have A/A concurrent controllers but didn’t die of sticker shock from the price tag. If you went the ‘safe’ mid-range route and bought something like the HP EVA, EMC Clariion or any NetApp then you don’t have concurrent controllers and are active/passive in vSphere speak.
If you are lucky enough to have an A/A concurrent controller, all is not well out of the box. The default pathing mode is fixed path which doesn’t make the most of your array. You can change the mode to round robin, but even that simple change isn’t optimal. By default round robin doesn’t alternate single I/Os down each path. Instead it sends a stream of 1000 I/Os at once, then changes the path. This can lead to a non-optimal configuration. If your storage vendor concurs, changing this value to ‘1’ will evenly distribute the load among all paths. But check with your array vendor before changing anything!
According to EMC, even the native round robin feature in vSphere leaves something to be desired. So they wrote PowerPath VE, which completely replaces the vSphere MPIO subsystem. News to me was that PowerPath VE works with many non-EMC arrays as well as EMC arrays. EMC claims that PowerPath VE can increase storage performance 30% to 300% over native MPIO. They have a 45-day trial version you can download and try out, to let you measure how much it may help you out. Can’t hurt to try it out, as long as your array is on their compatibility list.
iSCS has an entirely different set of issues, which I may tackle in another blog. If you want to check out the blog for the presenter of the session, you can see it here. The session was jam packed with information, and this post only covered a tiny piece of his fire hose of information.
Derek, just to be clear, I work for 3PAR. The terminology around active passive has seen a certain amount of vendor inflation the last few years. It’s a shame things are getting so out of control. Active/active refers to the ability to access a single LUN through more than one controller. Sounds easy, but is very difficult to implement and is a very big deal where VMware I/O performance is concerned. FWIW, 3PAR’s does this along with the other vendors you mentioned and I’m pretty sure Compellent’s does not. “Concurrent” refers to an active/passive scenario where two controllers are both… Read more »