VMware offers a number of features to protect your virtualized environment. VMware cluster technology prevents...
downtime for workloads that run in VMs. VMware Fault Tolerance protects applications from underlying hardware failure and doesn't cause downtime. Perhaps most important of these tools and features is vSphere High Availability, which reduces application downtime and prevents failure in a clustered environment.
You need at least two ESXi hosts managed by vCenter Server to set up vSphere HA. You also need a form of shared storage because you can't protect VMs that run on a host's local storage. If your host has hardware problems and becomes unresponsive offline, both the local storage and the VMs will be inaccessible.
VMware vSphere can have up to 64 hosts within a single cluster. However, you can manage several clusters within your data center with a single vCenter Server.
VSphere HA provides a number of benefits. It restarts VMs on other hosts within a cluster to protect against server failure. It continuously monitors VMs and, in the event of a failure, resets failed VMs. In the event of a data store accessibility failure, it restarts affected VMs on other hosts that still have access to their data stores. Finally, it restarts VMs if their host becomes isolated on the management or VMware vSAN network. VSphere HA provides this protection even if the network is partitioned.
To get the best possible configuration, build redundancy into your network design by using at least two network interface cards (NIC). Configure hosts so that vSphere HA does not use VMkernel NICs that share subnets with VMkernel NICs used for other purposes. Make sure that the VMkernel NICs that vSphere HA and other features use exist on different subnets or use virtual LANs for separation.
You should also set up a redundant network IP isolation address. If a host is able to ping its network isolation address, that means it isn't network isolated and that other hosts in the cluster have either failed or are network-partitioned. If the host is unable to ping its isolation address, the host is isolated from the network and will take no failover action.
You can connect a team of two NICs to separate physical switches to improve the reliability of a management network. Servers that connect through two NICs -- and through separate switches -- have two independent paths on which they send and receive heartbeats. This makes the cluster more resilient. Data store heartbeats can function as a second monitoring channel for vSphere HA. Data store heartbeating avoids false restarts for VMs in the event of a management network failure. The system can use a shared data store to verify whether the host is reachable or not. The default number of heartbeat data stores is two; the maximum valid value is five. You can override the default value by an advanced attribute: das.heartbeatdsperhost.
If you run vSphere 6.5 and your hardware supports it, use Proactive HA whenever possible. Proactive HA does what its name suggests: It identifies the hardware conditions of a host and works with the Distributed Resource Scheduler to evacuate VMs from a host before a problem occurs. Proactive HA works with hardware from OEM vendors, such as Hewlett Packard Enterprise, Dell and Cisco. Those vendors have their own hardware monitoring systems and offer vSphere plug-ins that support this functionality.
Common VMware HA configuration errors
Proper VMware HA and DRS rules boost uptime
Reduce downtime with VMware HA and Fault Tolerance