This content is part of the Essential Guide: VMware virtual recovery and backup best practices and tools

What are some best practices for vSphere HA clusters?

With the right configuration, VMware vSphere HA can reduce application downtime, protect against server failure and restart failed VMs.

VMware offers a number of features to protect your virtualized environment. VMware cluster technology prevents...

downtime for workloads that run in VMs. VMware Fault Tolerance protects applications from underlying hardware failure and doesn't cause downtime. Perhaps most important of these tools and features is vSphere High Availability, which reduces application downtime and prevents failure in a clustered environment.

You need at least two ESXi hosts managed by vCenter Server to set up vSphere HA. You also need a form of shared storage because you can't protect VMs that run on a host's local storage. If your host has hardware problems and becomes unresponsive offline, both the local storage and the VMs will be inaccessible.

If you store your VMs in a shared location accessible to all hosts within your cluster, the system can trigger a VM restart on one of the remaining hosts in the cluster.

VMware vSphere can have up to 64 hosts within a single cluster. However, you can manage several clusters within your data center with a single vCenter Server.

VSphere HA provides a number of benefits. It restarts VMs on other hosts within a cluster to protect against server failure. It continuously monitors VMs and, in the event of a failure, resets failed VMs. In the event of a data store accessibility failure, it restarts affected VMs on other hosts that still have access to their data stores. Finally, it restarts VMs if their host becomes isolated on the management or VMware vSAN network. VSphere HA provides this protection even if the network is partitioned.

To get the best possible configuration, build redundancy into your network design by using at least two network interface cards (NIC). Configure hosts so that vSphere HA does not use VMkernel NICs that share subnets with VMkernel NICs used for other purposes. Make sure that the VMkernel NICs that vSphere HA and other features use exist on different subnets or use virtual LANs for separation.

VSphere HA provides a number of benefits. It restarts VMs on other hosts within a cluster to protect against server failure. It continuously monitors VMs and, in the event of a failure, resets failed VMs.

You should also set up a redundant network IP isolation address. If a host is able to ping its network isolation address, that means it isn't network isolated and that other hosts in the cluster have either failed or are network-partitioned. If the host is unable to ping its isolation address, the host is isolated from the network and will take no failover action.

You can connect a team of two NICs to separate physical switches to improve the reliability of a management network. Servers that connect through two NICs -- and through separate switches -- have two independent paths on which they send and receive heartbeats. This makes the cluster more resilient. Data store heartbeats can function as a second monitoring channel for vSphere HA. Data store heartbeating avoids false restarts for VMs in the event of a management network failure. The system can use a shared data store to verify whether the host is reachable or not. The default number of heartbeat data stores is two; the maximum valid value is five. You can override the default value by an advanced attribute: das.heartbeatdsperhost.

If you run vSphere 6.5 and your hardware supports it, use Proactive HA whenever possible. Proactive HA does what its name suggests: It identifies the hardware conditions of a host and works with the Distributed Resource Scheduler to evacuate VMs from a host before a problem occurs. Proactive HA works with hardware from OEM vendors, such as Hewlett Packard Enterprise, Dell and Cisco. Those vendors have their own hardware monitoring systems and offer vSphere plug-ins that support this functionality.

Next Steps

Common VMware HA configuration errors

Proper VMware HA and DRS rules boost uptime

Reduce downtime with VMware HA and Fault Tolerance

Dig Deeper on VMware High Availability and Fault Tolerance