Problem solve Get help with specific problems with your technologies, process and projects.

VMware Clusters/Availability: What’s (not so) new in vSphere5 – Part 5

This fifth part covers the changes to VMware Clustering components and their complementary or dependent features. VMware HA has changed the most in this release.

In the fifth part of this series, I look at the changes to the VMware Clustering components and their complementary or dependent features. By that I mean I’m going to look at vMotion, VMware DRS, Resource Pools, and VMware HA. It’s perhaps VMware HA that has changed the most in this release – in fact its been completely re-written with a new system of Master/Slave together with improved conditions for working out if a host has really failed, or whether it's in the middle of some type of “split brain” situation. With that said, let's start with the stuff that's still very much the same in vSphere5 as it was in vSphere4.

vCenter Availability

vCenter Availability is still delivered on Windows using the vCenter Heartbeat Service (a OEM’d version of NeverFail that protects both vCenter and the backend Microsoft SQL Database). For the vCenter Virtual Appliance (VCVA), that availability is provided by the fact it can reside on the VMware HA cluster or protected with VMware Fault-Tolerance.

Resource Pools

Resource Pools are pretty much the same they have always been – the difference is in the way they are stored. Previously, even resource pools were configured on VMware DRS cluster, where actually stored on the ESX host. If you like vCenter together with VMware DRS made them easier to setup and configure. vSphere5 relocates these resource pools to being stored as part of vCenter in the central database, and then applied down to the ESX host – in a way that is kind of similar to Distributed vSwitches. The change was introduced alongside the introduction of PXE booting for ESX hosts (or so called “stateless computing”). In this model, the ESX host can be diskless and therefore has no way to store the resource pool information directly (except in memory). These “Auto-Deploy” version of ESX get their resource pool configuration from the auto-deploy server/service if vCenter is unavailable.


vMotion forms the bedrock of the VMware DRS feature and it has received a little bit of an uplift in vSphere5 as well. It now supports a multi-nic configuration of either 4 x 10Gps NICs or  16 x 1Gps NICs. Additionally, customers with Enterprize Plus licensing get a “version” of vMotion that tolerates a 10ms network latency value – this is primarily for customers who want to build environments that support so-called “Long Distance VMotion” capabilities with capacity to carry out vMotion events from one site to another.

VMware Distributed Resource Scheduler (DRS)

Although not a new feature – it may be new to you – vSphere4.1 introduces “DRS Groups”. This allows for groups within what is essentially a group (if you think about it DRS is already grouping of ESX hosts CPU/Memory resources). You can create groups of VMs and groups of ESX hosts within the DRS Cluster – so its possible for example, to manage the placement of VMs across multiple blade chassis’s or across sites. For example, you could create groups in such a way that for a web-application, all odd numbered web-servers (web01,03,05 and so on) preferred to reside on DRS Host Group called Chassis1 and all the even numbered web-servers within the same web-application preferred to reside on a DRS Host Group called “Chassis2″. This would cause the VM to distribute between two chassis that made up a single DRS cluster, and help to go protect you from chassis outage. (Of course, it would be VMware HA’s job to restart the VMs from one failed chassis to restart on another – based on your rules for admission control.) Another example is when VMware DRS and HA have been “stretched” across two sites. You could create DRS Host Groups to represent each site “SiteA” and “SiteB”, and use DRS VM Groups to associate VMs with a particular site.

A prosaic example comes from having to manage draconian and out-moded licensing regulations (AKA Oracle!) where you are licensed by the number of CPU sockets physically present. In this case you could create a DRS Host Group that just contained two ESX hosts, and then a DRS VM Group that only contained the Oracle VMs – finally you could peg the DRS VM Group to the DRS Host Group – although the VMs would be able to be moved from one host to another – they would not be able to access the other server resources in the DRS Cluster.

To some degree you could see DRS Groups as an extension of the affinity/anti-affinity settings that can be applied between VMs. Although they are a little bit more subtle than that – DRS Groups do allow these affinity settings to be soft or hard. In other words they can be applied, but violated if necessary - say in the event that its more important to have the VMs up than strictly meet the rules around them – for example in the case of a VMware HA event. With just VM affinity, it was possible to have conflicts between the rules of DRS, HA and maintenance mode. 

For example, on a two-node cluster where VM1 and VM2 have been set to be “Kept Apart,” the ESX host would never enter maintenance mode successful, because maintenance mode wasn’t allowed to break the DRS rule. At times it felt like the left-hand, didn’t know what the right-hand was up to.

Enhanced vMotion Compatiablity

EVC has been improved with support for new processors, and an improved method that will allow VMware to add new CPUs more seamlessly in future. Management of EVC has been enhanced to allow it to be enabled even when hosts are running VMs or in a disconnected state. Previously, all the hosts had to be the same CPU class or all the VMs powered of to enable EVC after the DRS cluster had been created.

Fault Tolerance

FT is now properly compatible with DRS, and the Primary & Secondary VMs that make up a FT protected system are no longer marked as “disabled” in DRS. From a scalability perspective, the number of Primary and Secondaries per-ESX host remains the same – four Primaries/four Secondaries per ESX host. Remember to disable power-management features in the ESX host BIOS that might vary the clock speed of the CPUs, as this can cause “vLockstep” to become unreliable.

VMware High-Availability (HA)

There’s been a number of changes to HA from vSphere 4.0 to 5.0:

vSphere 4.1 introduced the “Enable Host Monitoring” option – that is used to temporarily turn off the checks that HA makes to see if host is available – that could be accidentally triggered by some network maintenance. In the past VMware recommended that you had two heartbeat networks (one for host management, and backup) and that these should be able to ping a network device like a router. If you didn’t think about your network maintenance windows properly – you could find HA reacting to router being unavailable as if it was indicating that an ESX host had died. Essentially creating a false positive.

vSphere 4.1 also added two “Admission Control” options – that allow you to reserve a percentage of resources from the cluster for failovers. Many people regard this as a more intuitive way of allocating resources than spinner that allows you to control the number of host-downs you tolerate. The spinner uses a “slot size” value to work out what the spare capacity was in the cluster. HA would often get this slots size wrong, and customers found themselves having to use the “Advanced Options” to adjust it.  The Second admission control component added in vSphere4.1 is the “specify failover hosts’. It’s always been a feature, except you had to use the “Advanced Options” to configure it. It allows you to reserve an entire ESX host, waiting for a failover events to happen – and it essentially allows HA to work in a more classical “Active/Passive” model – where the active node takes production VMs, and the passive node merely waits in reserve as resource to be used if a failure happens.

vSphere 4.1 also added “Application Monitoring” to HA’s capabilities, and to small degree opened up HA to 3rd party extensions.

This application monitoring by default checks the status of the VMware Tools Heartbeat Service – and based on series of settings and tolerances will automatically restart the VM if HA thinks its has become unresponsive. A small number of vendors have decided to opt into the API’s surround this to add more “service” aware approach such as Symantec’s App-HA technology.

In vSphere5 although the look and feel of the HA settings remained very similar to previous releases – the underlying architecture of HA changed significantly. Previously based on a port of Legato’s AAM product, VMware HA in vSphere5 was completely re-written around what’s called the “Fault Domain Manager” (FDM), and a new system where each HA-enabled cluster has single Master – surround by slaves. If the master fails, then one of the slaves is elected to be the master. So gone is the old system of Primaries and Secondaries – and having a set limit of primaries.

The Masters role includes these responsibilities:

  • Ensures VMs are protected
  • Maintains the state of the VMs and the Slaves
  • Restarts VMs if a Slave fails
  • Maintains a list of current Slaves and Protected VMs
  • Informs Slaves of changes in configuration within the Cluster
  • Sends heartbeat packets to Slaves – so it knows which Slaves are still alive, and Slaves know which ESX host is the current master…
The Slave role includes these responsibilities:
  • Monitors the VMs it currently owns
  • Slaves can reset VMs currently running on them if they become unresponsive and VM Monitoring is enabled
  • Slaves can update the Master of VM power state changes
  • Monitors the status of the Master, if the Master fails, they can trigger and take part in an election – to elect a new Master.

For this reason the vSphere Client has UI updates which show which ESX host is the Master.

Additionally, there's a per-VM status update that reports if an VM is protected by HA correctly.

The vSphere5 version of HA, also includes additional ways of confirming whether an ESX host is down or up. Previously, HA relied purely on a system of network updates across the LAN to work out if an ESX host had crashed or not. That’s why it was so important to have redundant heartbeat networks to make sure that this information was valid, otherwise false positives would occur caused by host “isolation” or what’s sometimes referred to as “Split Brain”.  That method still exists – but can be now backed up by additional check using datastores – in a feature that VMware calls the “Datastore Heartbeating“.

If you think about it for vMotion, DRS and HA to work – all of the hosts need to be connected to the SAME network, and also the SAME storage. So by checking both the network and connectivity to the storage – you can increase the quality of the condition question, “Has the server died or is merely orphaned from the network?”. A server that has no communication to either the ethernet network and the storage network might as well be dead to the world in my book – and so it can be classed as failed with high-degree of certainty than mere network checks would allow for. HA only consults these heartbeat datastores if the management network and your backup heartbeat network have failed.

When VMware HA is enabled, it will automatically select two datastores to be part of the heartbeating process OR you can over-ride those settings and manually specify datastores. For example in my case – I often mount and unmount datastore in rapid succession – but there are always 3 datastores my hosts are ALWAYS connected to – the software, template and “infrastructure” datastore:

Note: Even if you select more than two datastores, HA only will use two out of the datastores selected. As a consequence of this new architecture you’ll find the “Status” options now show the Hosts, VMs, and Heartbeat Datastores:


As you can see that’s a lot of changes in VMware HA, and as consequence there are new files, new locations and new logs that have been created:

Dig Deeper on VMware Resources

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.