James Thew - Fotolia


How to keep virtual machine snapshots in check

While using snapshots can help undo a bad patch or faulty configuration change, they need to be controlled before they become a source of trouble.

Virtual machine snapshots are one of the most useful parts of VMware or indeed any virtualization platform. When used correctly, they are ideal for taking a moment-in-time copy that can be rolled back to if the upgrade or other change fails. Snapshots can also help prevent the need for a long -- and slow -- restoration process if something does go wrong.

While there are certain instances where snapshots are useful, they don't remove the need for a backup. They should not be a substitute for a full and proper backup. When snapshots are used incorrectly or for the wrong purpose, they can cause problems and even downtime for multiple virtual machines (VMs).

The anatomy of the virtual machine snapshot process

To understand why snapshots are useful, it is important to understand what actually happens when we create and use snapshots.

A VM snapshot is point in time (PIT) copy of the VMs disks and, if selected, memory. The administrator creates a delta -- meaning difference -- file that contains the changes that are made to the VM and file system after the snapshot is created. The original disk is essentially frozen. Any writes to the system are put into the delta file and the change recorded so the lookup will go to the delta disk for the modified contents.

The growing appetite of a delta file

It is important to note is that delta files are not restricted to the amount of storage they can consume on a data store. Figure 1 shows an extreme example of snapshot growth. This machine has a snapshot more than one year old; and the snapshot eats up 20 GB of provisioned space. The VM's performance was not good, to say the least.

Snapshot space
A virtual machine snapshot that occupies 20 GB of provisioned space on the disk means poor performance for the VM.

With every change on the VM file structure, the delta disk grows. If not removed, it will not stop until it consumes all the space on the data store. This situation will result in two potential issues. First, VMs that are not powered on will not boot because the virtual memory manager will not have the required disk space. Even worse, the VMs that are currently powered on will start to fail and issue warnings about running out of disk space. If a VM attempts to write to a full data store there may be data loss in some circumstances.

A VM will halt if there is thin provisioning in your infrastructure because thick provision eager zeroed VMs have all their disk space allocated up front and can thus continue. Never let VM snapshots grow too big. VMware recommends keeping a single snapshot instance for a maximum of 72 hours to prevent it from consuming too much storage or degrade performance too far.

Avoid multiple snapshots if possible

Multiple snapshots on the same machine should be avoided in most circumstances; a system will get slower each time a snapshot is created. Each snapshot creates a subsequent delta disk, which adds more work if several delta disks have to be read in succession. This process affects VM responsiveness and makes for additional disk interaction.

If you want to keep the last snapshot in a series of snapshots, you can consolidate them into one snapshot. This will help with speed and general manageability. This can be done by selecting the VM in question, right clicking and then choosing Consolidate. You can tell if a machine needs consolidation by going to the VM and templates view and add the Needs Consolidation column.

Potential issues with snapshots

Snapshots introduce some issues if you are trying to move VMs by any method other than vMotion or Storage vMotion. While you can copy the files across via a normal copy, doing this with any VM with a snapshot and the result will be corrupted files.

There are some types of VMs that cannot be snapshot. These include VMs that have shared SCSI buses such as clustered servers. Disks that are independent cannot be snapshot. If you are using physical raw device mapping, you cannot snapshot these because the underlying disk is managed by the VM. If the VMware host doesn't manage the device, then it can't create a successful snapshot of it.

There are issues you need to be aware of if you are running ESXi 4 or below. First, you can't migrate a VM that has snapshots using Storage vMotion. The easy fix is to consolidate the snapshots, which means you lose the ability to restore those PIT snapshots. This limitation was removed in vSphere 5. Another issue that is frequently seen is when a host is temporarily unresponsive when consolidating a large snapshot. The VMs on the host will still run; the condition is temporary.

Tools for administrators

VMware Tools, although not essential when using snapshots, is strongly recommended. VMware Tools allow the operating system to quiesce -- or reduce -- the disk activity so a snapshot is more easily accomplished, rather than have the VMware host fight to get a PIT copy.

There are a couple of ways to check how big your snapshots are getting. The easiest is to use RVTools and use the snapshot size feature.

The other way to do it is to use PowerCLI and the command:

get-vm | get-snapshot

This will list all your snapshots across the vCenter in question.

Next Steps

Discover the differences between storage and virtualization snapshots

How do Hyper-V snapshots work?

Follow these simple rules to avoid VM snapshot problems

Dig Deeper on Troubleshooting VMware products