VMworld 2012: vSphere HA and Datastore Access Outages INF-BCO2807

This session was extremely technical and went over the inner workflows of HA. For a better and more in-depth details, I would strongly suggest getting the VMware vSphere 5.1 Clustering Deepdive book.

HA protects against three failure modes: Host/VM failures; host network isolated and datastore PDL; Guest OS hangs and apps crashes
Datastore accessibility outages occur infrequently but have a large cost
vSphere 5.0 introduced FDM, or Fault Domain Manager, which completely replaces the 4.x HA agent and software.
Datastores are used for two purposes by HA: Communications channel between FDMs and persistent storage for configuration information
Heartbeat datastores – two chosen by each host, enables the master to detect VM power states.
Best practice: Use “leave powered on” host isloation response option
In 5.0 U1, Permanent Device Loss (PDL) the guest I/O will trigger the VM to be killed, and HA will restart it on a host that can access the datastore.
Futures for HA