Session: INF4535
Duncan Epping, Frank Denneman
Introduction to SDA (Software defined availability): VM, server, storage, data center, networking, management. Business only cares about the application, not the underlying infrastructure.
vSphere HA
- Configured through vCenter but not dependent on it
- Each host has an agent (FDM) will be installed for monitoring state
- HA restarts VMs when failure impacts those VMs
- Heartbeats via network and storage to communicate availability
- Can use management network or VSAN network if VSAN is enabled
- Need spare resources
- Admission control – Allows you to reserve resources in case of a host failure
- Admission control guarantees VM receives their reserved resources after a restart, but does not guarantee that VMs perform well after a restart.
- Best practices: Select policy that best meets your needs, enable DRS, simulate failures to test performance
- Percentage based is by far the most used and is Duncan recommended
- Duncan went through various failure scenarios (host failure, host isolation, storage failure)Â and how HA restarts the VMs.
- Use VMCP (new in 6.0) [VM component protection]. Helps protects against storage connectivity loss.
- Generic recommendations: disable “host monitoring”; make sure you have redundant management network; enable portfast; use admission control
DRS
- DRS provides load balancing and initial placement
- DRS is the broker of resources between producers and consumers
- DRS goal is to provide the resources the VM demands
- DRS provides cluster management (maintenance mode, affinity/anti-affinity rules)
- DRS keeps VM’s happy, it doesn’t perfectly balance each host
- DRS affinity rules: Control the placement of VMs on hosts within a cluster.
- DRS highest priority is to solve any violation of affinity rules.
- VM-host groups configureable in mandatory (must-rule) or preferential (anti-)affinity rules (should-rule)
- A mandatory (must) rule limits HA, DRS and the user
- Why use resource pools? Powerful abstraction for managing a group of VMs. Set business requirements on a resource pool.
- Bottom line is resource pools are complex, and VMs may not get the resources you think they should. Only use them when needed.
- Try to keep the affinity rules as low as possible. Attempt to use preferential rules.
- Tweak aggressiveness slider if cluster is unbalanced.
SDRS and SIOC
- Storage IO control is not cluster aware, it is focused on storage
- Enabled at the datastore level
- Detects congestion and monitors average IO latency for a datastore
- Latency above a particular threshold indicates congestion
- SIOC throttles IOs once congestion is detected
- Control IOs issued per host
- Based on VMs shares, reservations, and limits
- SDRS runs every 8 hours and checks balance, and looks at previous 16 hours for 90th percentile
- Capacity threshold per datastore
- I/O metric threshold per datastore
- Affinity rules are available
- SDRS is now aware of storage capabilities through VASA 2.0 (array thin provisioning, dedupe, auto-tiering, snapshot)
- SDRS integrated with SRM
- Full vSphere replication full support
vMotion
- Migrate live VM to a new compute resource
- vSphere 6.0: cross vCenter vMotion, long-distance vMotion, vMotion to cloud
- May not realize it, but lots of innovation and new features here since its introduction in 2003
- Long distance vMotion supports up to 150ms. No WAN acceleration needed.
- vMotion anywhere: vMotion cross-vCenters, vMotion across hosts without shared storage, easily move VMs across DVS, folders and datacenters.
vSphere Network IO Control
- Outbound QoS
- Allows you to partition network resources
- Uses resource pools to differentiate between traffic types (VM, NFS, vMotion, etc.)
- Bandwidth allocation: Shares and reservations. NIOC v3 allows configuration of bandwidth requirements for individual VMs
- DRS is aware of network reservations as well.
- Bandwidth admission control in HA
- Set reservations to guarantee minimum amount of bandwidth for performance of critical network traffic. Sparingly use VM level reservations.