VMware High Availability and Fault Tolerance FAQ

VMware High Availability and Fault Tolerance limit downtime, which is a major benefit for virtualized infrastructure. But VMware HA and FT aren't turnkey features. This FAQ explains each technology.

VMware High Availability and Fault Tolerance are must-haves for IT shops that demand 100% uptime.

First introduced in VMware Infrastructure 3, VMware High Availability (HA) provides failover protection against hardware and software malfunctions. If HA detects a failure, it automatically restarts a virtual machine (VM) without the need for manual intervention.

HA was a major step for highly available technologies, but some users require continuous uptime for virtual machines. With the release of vSphere, VMware introduced the Fault Tolerance (FT) utility, which provides uninterrupted availability by eliminating the need for VMs to restart.

Together, HA and FT supply the feature sets and capabilities for virtual environments to run at nearly 100% uptime. But implementing and maintaining an HA- and FT-enabled infrastructure is challenging. For users that are unsure about the risk-reward proposition of these high-availability tools, the answers to these frequently asked questions should provide some guidance.

How does VMware High Availability work?

VMware High Availability is a vSphere component that's configured in the vSphere Client. It eliminates the need for dedicated standby equipment by performing the following tasks:

  • monitoring physical servers and VMs;
  • server failure detection; and
  • migrating and restarting VMs from offline hosts

In the update to vSphere 4.1, High Availability was upgraded. The new, 64-bit vCenter increased HA's theoretical maximums to 320 VMs per host and 3,000 VMs per cluster. HA also sports a health-status menu, where users can view alarms and alerts in a centralized location.

How does VMware Fault Tolerance work?

VMware Fault Tolerance is based on VMware Workstation's record/replay technology. FT copies a functional VM to another ESX host, then transfers CPU and virtual device inputs from the primary VM (record) to the secondary VM (replay) through a network interface card. This process ensures that both VMs are synchronized and that during a failure, the secondary VM can take over. Additionally, the hypervisor suppresses the secondary VM until the primary VM malfunctions.

What are the benefits and drawbacks to VMware High Availability?

HA provides business continuity and disaster recovery protection for businesses. With HA enabled, virtualization administrators have one less task to directly oversee. The technology runs 24/7, so it can restart failed, mission-critical servers and virtual machines during off-peak hours, without the assistance of IT staff.

VMware HA minimizes downtime but cannot prevent it completely. During reboots, VMs will remain offline. Also, HA is not available for every vSphere licensing tier, and vCenter is required.

Why should I use VMware Fault Tolerance?

FT provides continuous VM availability through its record/replay functionality. Unlike other high-availability technologies, FT is operating system-agnostic and doesn't require licenses for each server.

This utility is easy to deploy in vCenter: Right-click on a VM, and select Fault Tolerance. FT creates the secondary virtual machine, and then you can begin the synchronization process.

When should I use VMware Fault Tolerance?

It's probably not feasible to protect every VM with FT because of host limitations and hardware compatibility issues. But there are ideal use cases for VMware Fault Tolerance:

  • VMs protected by HA. Generally, these VMs are the most critical. If you have enough computing and storage resources, this option benefits users. If a server fails, they don't have to wait for VMs to restart.
  • On-demand coverage. Certain applications and VMs become mission critical during particular times of the month or year (e.g., payroll and accounting VMs). Because FT is simple to initiate, you can provide failover protection during high-volume periods and deactivate it during nonpertinent times to conserve resources.
  • Servers with a single point of failure. High-availability options for certain application servers can be costly and complex. FT provides a simple alternative.
  • Expensive clustering. Sometimes it's hard to justify clustering solutions for branch offices or medium-sized databases. In these scenarios, FT is a cost-effective option.

What are the requirements for VMware Fault Tolerance?

FT has specific and restrictive hardware requirements. It doesn't run on every vSphere-equipped server because of special CPU requirements, for example. FT calls for Intel 31xx or more recent processors and AMD 13xx processors or greater. It also doesn't support multiprocessor VMs (only single CPU VMs) and vSphere's hot-add RAM or hot-plug CPU features.

Because of FT's particular hardware requirements, VMware published the SiteSurvey utility, which checks an infrastructure's compatibility with FT. SiteSurvey connects to vCenter Server and generates a host compatibility report. Clicking the report links provides detailed information and charts with FT's requirements.

Dig Deeper on VMware High Availability and Fault Tolerance