VMworld 2015: 5 Functions of SW Defined Availability

Session: INF4535

Duncan Epping, Frank Denneman

Introduction to SDA (Software defined availability): VM, server, storage, data center, networking, management. Business only cares about the application, not the underlying infrastructure.

vSphere HA

Configured through vCenter but not dependent on it
Each host has an agent (FDM) will be installed for monitoring state
HA restarts VMs when failure impacts those VMs
Heartbeats via network and storage to communicate availability
Can use management network or VSAN network if VSAN is enabled
Need spare resources
Admission control – Allows you to reserve resources in case of a host failure
Admission control guarantees VM receives their reserved resources after a restart, but does not guarantee that VMs perform well after a restart.
Best practices: Select policy that best meets your needs, enable DRS, simulate failures to test performance
Percentage based is by far the most used and is Duncan recommended
Duncan went through various failure scenarios (host failure, host isolation, storage failure) and how HA restarts the VMs.
Use VMCP (new in 6.0) [VM component protection]. Helps protects against storage connectivity loss.
Generic recommendations: disable “host monitoring”; make sure you have redundant management network; enable portfast; use admission control

DRS

DRS provides load balancing and initial placement
DRS is the broker of resources between producers and consumers
DRS goal is to provide the resources the VM demands
DRS provides cluster management (maintenance mode, affinity/anti-affinity rules)
DRS keeps VM’s happy, it doesn’t perfectly balance each host
DRS affinity rules: Control the placement of VMs on hosts within a cluster.
DRS highest priority is to solve any violation of affinity rules.
VM-host groups configureable in mandatory (must-rule) or preferential (anti-)affinity rules (should-rule)
A mandatory (must) rule limits HA, DRS and the user
Why use resource pools? Powerful abstraction for managing a group of VMs. Set business requirements on a resource pool.
Bottom line is resource pools are complex, and VMs may not get the resources you think they should. Only use them when needed.
Try to keep the affinity rules as low as possible. Attempt to use preferential rules.
Tweak aggressiveness slider if cluster is unbalanced.

SDRS and SIOC

Storage IO control is not cluster aware, it is focused on storage
Enabled at the datastore level
Detects congestion and monitors average IO latency for a datastore
Latency above a particular threshold indicates congestion
SIOC throttles IOs once congestion is detected
Control IOs issued per host
Based on VMs shares, reservations, and limits
SDRS runs every 8 hours and checks balance, and looks at previous 16 hours for 90th percentile
Capacity threshold per datastore
I/O metric threshold per datastore
Affinity rules are available
SDRS is now aware of storage capabilities through VASA 2.0 (array thin provisioning, dedupe, auto-tiering, snapshot)
SDRS integrated with SRM
Full vSphere replication full support

vMotion

Migrate live VM to a new compute resource
vSphere 6.0: cross vCenter vMotion, long-distance vMotion, vMotion to cloud
May not realize it, but lots of innovation and new features here since its introduction in 2003
Long distance vMotion supports up to 150ms. No WAN acceleration needed.
vMotion anywhere: vMotion cross-vCenters, vMotion across hosts without shared storage, easily move VMs across DVS, folders and datacenters.

vSphere Network IO Control

Outbound QoS
Allows you to partition network resources
Uses resource pools to differentiate between traffic types (VM, NFS, vMotion, etc.)
Bandwidth allocation: Shares and reservations. NIOC v3 allows configuration of bandwidth requirements for individual VMs
DRS is aware of network reservations as well.
Bandwidth admission control in HA
Set reservations to guarantee minimum amount of bandwidth for performance of critical network traffic. Sparingly use VM level reservations.