Session INF5306
DRS is the #1 scheduler in the datacenter today
92% of clusters have DRS enabled. 79% are in fully automated mode. 87% have affinity and anti-affinity rules.
43% of clusters have resource pools enabled and use them
99.8% of cluster use maintenance mode
Bottom line: DRS is popular
DRS collects innumerable stats every 20 seconds for its calculations
- CPU Reserved
- Memory reserved
- CPU active, run and peak
- memory overhead, growth-rate
- Active, consumed and idle memory
- Shared memory pages, balloon, swapped, etc.
- VM happiness is the most important metric (if demands/entitlementws are always met, then VM is ‘happy’)
Constraints for initial placement and load balancing
- Constraints are a big part of decision making
- HA admission control policies
- Affinity and anti-affinity rules
- # concurrent vMotions
- Time to complete vMotion
- Datastore connectivity
- vCPU to pCPU ratio
- Reservations, limits and share settings
- Agent VMs
- Special VMs (SMP-FT, vFlash, etc.)
Cost Benefit and minGoodness
- Cost-benefit analysis – VM happiness is evaluated against the cost of a migration
- Cost considerations: per vMotion of 30% CPU core for 1Gb and 100% of a core for 10Gb; Memory consumption of ‘shadow VM’ at the destination host
- Benefit considerations: Positive performance benefit to VMs at the source host, overall workload distribution has to be much better
- Each analysis results in a rating from -2 to +2
- MinGoodness (migration threshold slider) is -2 to +2. User can set this.
Takeaway
- VM happiness is the #1 influence
- Influenced by real time stats, constraints and cost/benefit analysis
- A small imbalance should not be a concern
- Default setting of DRS aggressiveness is best
New Features in vSphere 6.0
- Network-aware DRS – ability to specify bandwidth reservation for important VMs
- Initial placement based on VM bandwidth reservation
- Automatic remediation in response to reservation violations due to pNIC saturation, pNIC failure
- Tight integration with the vMotion team and will do a unified recommendation for cross-vCenter vMotion
- Runs a combined DRS and SDRS algorithm to generate a tuple (host, DS)
- CPU, memory, and network reservations are considered as part of admission control
- All the constraints are respected as part of the placement
- VM-to-VM affinity and anti-affinity rules are carried over during cross-cluster and cross-vCenter migration
- Initial placement enforces the affinity and anti-affinity constraints
- Improved overhead computation – greatly improves the consolidation during power-on
Cluster Scale and Performance Improvements
- Increased cluster capacity to 64 hosts and 8K VMs
- DRS and HA extensively tested at maximum scale for VCSA and Windows
- Up to 66% performance increase in vCenter (power on, DRS calcs, etc.)
- VM power-on latency has reduced by 25%
- vMotion operation is 60% faster
- Faster host maintenance mode
Extensive Algorithm Usage
- DRS is the lynchpin of the SDDC vision
- vSphere HA
- VUM
- vCloud Director
- vCloud Air
- Fault Tolerance
- ESX Agent Manager
Best Practices
- Tip #1: Full storage connectivity
- Tip #2: Power management settings – Set BIOS to OS control and vSphere to balanced.
- Tip #3: Threshold setting – Default of 3 works great.
- Tip #4: Automation level – Fully automated is best choice
- Tip #5: Beware of resource pool priority inversion. Make sure that cramming more VMs won’t dilute the shares.
- Tip #6: Avoid setting CPU-affinity
Future Directions
Proactive HA
- Proactive evacuation of VMs based on hardware health metrics
- Partnering with hardware vendors to integrate and certify
- Moderately degraded mode and severely degraded modes
- VI admin can configure the DRS action for each health state event
- Host maintenance mode and host quarantine mode
- VI admin can filter events
Network DRS v2
- Take pNIC saturation into account
- Tighter integration with NSX
- Ensure mice and elephant flow doesn’t share same network path
- Network layout topology – leverage topology for availability and performance optimizations
Proactive DRS
- Tighter integration with VRops analytics engine
- Periodic and seasonality demands incorporated into decision making
What-if Analysis
- A sandbox tab in UI to run ‘what if’ analysis
- VM availability assessment by simulating host failures
- Cluster over commitment during maintenance window
Auto-scale of VMs
- Horizontal and vertical scaling to maintain end-to-end SLA guarantees
- Spin-up and spin-down VMs based on workload
- Will first be offered as a service in vCloud air
- Increase CPU and memory resources to meet performance goals
- CPU/memory hot add is an additional option for DB tier
Hybrid DRS
- Make vCloud-air a seamless extension of enterprise datacenter capacity through policy based scheduling