This is my first technical session of VMworld 2013, VSVC5280. It covered new vSphere 5.5 DRS features, explained why DRS may not always “perfectly” balance your cluster. vSphere 5.5 has a lot of new storage features, and DRS has been enhanced to be aware of VSAN and vFlash technologies. As always during conferences I’m real time blogging, so I may not have caught all of the details and don’t have time to expound on the new features. Stay tuned for future blog posts on vSphere 5.5 goodness.
Session Outline:
- Advanced resource mgt concepts
- New features in vSphere 5.5
- Integrate with new storage and networking features
- Future directions
Advanced Resource Management Concepts
- DRS treats the cluster as one giant host
- Capacity of this “host” = capacity of the cluster
- Main issue: Fragmentation of resource across hosts
- Primary goal: Keep VMs happy by meeting their resource demands
- Why meet VM demand as the primary goal? VM demand satisfied makes VM or applications happy
- Why is this not met by default? Host overload
- Three ways to find more cluster capacity: Reclaim resources, migrate VMs, Power on a host (if DPM enabled)
- Demand does NOT equal current utilization
- Why not load balance the DRS cluster as the primary goal? Load balancing is NOT free. Movement has a cost.
- Load balancing is a mechanism used to meet VM demands. If VM resources are being met, don’t move the VM.
DRS Load-Balancing: The balls and the Bins Problem
- Problem: Assign n balls to m bins
- Key challenges: Dynamic numbers and sizes of bins/balls
- Constraints of on co-location, placement and others
- VM resource entitlements are the “balls” and Host resources at the “bins”
- Dynamic load, dynamic capacity
Goals of DRS Load-Balancing
- Fairly distribute VM demand
- Enforce constraints
- Recommend moves that improve balance
- Recommend moves with long term benefits
The Myth of Target Balance
- UI slider tells us which star threshold is acceptable
- Implicit target number of cluster imbalance metric
- Constraints can make it hard to meet target balance
- If all your VMs are happy, a little imbalance is FINE
vSphere 5.5 New Features
- In vSphere 5.1 there is a new option: LimitVMsPerESXhost – DRS will not admit or migrate more than x VMs to any host
- In vSphere 5.5 LimitVMsPerESXHostPercent
- Limit VMs per host limit: Example: Mean = 8, Buffer % 50, new limit is 12 (50% * 8)
- New: Latency-sensitive VMs and DRS. New GUI pulldown option
- Magical “soft affinity” rule to the current host, for workloads that may be sensitive to vMotions
CPU Ready Time Overview
- Amount of time the vCPU waits to run but before it can be scheduled on a pCPU. Measures CPU contention.
- Many causes for high ready time, many fixes, many red-herrings
- “%RDY is 80, that can NOT be good”: Cumulative number, so divide by number of cores.
- Rule of thumb: Up to 5% per vCPU is usually alright
- “Host utilization is very low, %RDY is very high”: Host power management reduces pCPU capacity
- Set BIOS option to “OS control” and let ESX decide
- Low VM or RP CPU limit values restrict cycles delivered to VMs. “Set your VMs free” by not configuring MHz limits
- NUMA Scheduling effects: NUMA scheduler can increase %RDY time
- Application performance is often better because NUMA scheduler optimizes memory performance
Better Handling in DRS
- vSphere 5.5 new feature: AggressiveCPUActive=1
- Only use for very spiky workloads that the 5 minute average may not catch
- vSphere 5.5: PercentIndleMBInMemDemand – Handles memory bursting protection
New Storage and Networking Features
- vFlash: Initial DRS placement just works
- DRS load balancing treats VMs as it soft-affinity to current host
- VMs will not migrate unless absolutely necessary
- Host maintenance mode may take longer
- vFlash space can get fragemented across the cluster
- vMotions may take longer
VSAN interop with SRS
- DRS is completely compatible with VSAN
Autoscaling Proxy Switch Ports and DRS
- DRS admission control – proxy switch port test
- Makes sure host has enough ports on proxy switch: vNIC ports, uplink pots, vmkernel ports
- In vSphere 5.1 per ports per host = 4096
- Host will power on no more than 400 VMs
- New to vSphere 5.5: Autoscaling switch ports
Future Directions
- Per-vNIC bandwidth reservations
- pNIC capacity at host will be pooled together
- Static overhead memory – influenced by VM Config parameters, VMware features, ESXi build number
- Overhead memory: This value is used in admission control and DRS load balancing
- Better estimation of these numbers leads to higher consolidation during power-on
Proactive DRS – Possible future features
- Lifecycle: Monitor, predict, remediate, compute, evaluate
- Imagine a vMotion happening before a workload spike
- Predict and avoid spikes
- make remediation cheaper
- Proactive load-balancing
- Proactive DPM – Power on hosts before capacity is needed
- use predicted data for placement, evacuation
- vCOPS integration to perform analytics and capacity planning