VMworld 2015: DRS Advancements in vSphere 6.0

Session INF5306

DRS is the #1 scheduler in the datacenter today

92% of clusters have DRS enabled. 79% are in fully automated mode. 87% have affinity and anti-affinity rules.

43% of clusters have resource pools enabled and use them

99.8% of cluster use maintenance mode

Bottom line: DRS is popular

DRS collects innumerable stats every 20 seconds for its calculations

CPU Reserved
Memory reserved
CPU active, run and peak
memory overhead, growth-rate
Active, consumed and idle memory
Shared memory pages, balloon, swapped, etc.
VM happiness is the most important metric (if demands/entitlementws are always met, then VM is ‘happy’)

Constraints for initial placement and load balancing

Cost Benefit and minGoodness

Cost-benefit analysis – VM happiness is evaluated against the cost of a migration
Cost considerations: per vMotion of 30% CPU core for 1Gb and 100% of a core for 10Gb; Memory consumption of ‘shadow VM’ at the destination host
Benefit considerations: Positive performance benefit to VMs at the source host, overall workload distribution has to be much better
Each analysis results in a rating from -2 to +2
MinGoodness (migration threshold slider) is -2 to +2. User can set this.

Takeaway

New Features in vSphere 6.0

Network-aware DRS – ability to specify bandwidth reservation for important VMs
Initial placement based on VM bandwidth reservation
Automatic remediation in response to reservation violations due to pNIC saturation, pNIC failure
Tight integration with the vMotion team and will do a unified recommendation for cross-vCenter vMotion
Runs a combined DRS and SDRS algorithm to generate a tuple (host, DS)
CPU, memory, and network reservations are considered as part of admission control
All the constraints are respected as part of the placement
VM-to-VM affinity and anti-affinity rules are carried over during cross-cluster and cross-vCenter migration
Initial placement enforces the affinity and anti-affinity constraints
Improved overhead computation – greatly improves the consolidation during power-on

Cluster Scale and Performance Improvements

Extensive Algorithm Usage

Best Practices

Tip #1: Full storage connectivity
Tip #2: Power management settings – Set BIOS to OS control and vSphere to balanced.
Tip #3: Threshold setting – Default of 3 works great.
Tip #4: Automation level – Fully automated is best choice
Tip #5: Beware of resource pool priority inversion. Make sure that cramming more VMs won’t dilute the shares.
Tip #6: Avoid setting CPU-affinity

Future Directions

Proactive HA

Network DRS v2

Take pNIC saturation into account
Tighter integration with NSX
Ensure mice and elephant flow doesn’t share same network path
Network layout topology – leverage topology for availability and performance optimizations

Proactive DRS