Taking snapshots of your virtual machines (VMs) is a useful way to preserve and restore VM configurations. But proper management is needed to avoid performance problems. In this tip, we'll explore advanced snapshot management topics. (For a review of snapshot basics or review how VMware snapshots work, see my previous tip.)
Disk space and deleting multiple snapshots
It's important to plan ahead and allow for ample disk space on your VMware virtual machine file system (VMFS) volumes for snapshot files. A good rule of thumb is to allow for disk space of at least 20% of the virtual machine's total disk size. But this amount can vary depending upon the type of server, how long you keep the snapshots, and if you plan on using multiple snapshots. If you plan on including the memory state with your snapshots, you'll also need to allow for extra disk space equal to amount of RAM assigned to the VM.
A VM with only one snapshot requires no extra disk space when deleting, or committing, it. (The term committing is used because the changes saved in the snapshot's delta files are now committed to the original virtual machine disk file, or VMDK.) There is also an extra helper delta file that is created when you delete snapshots. It contains any changes that are made to the VM's disk while the snapshot is deleted. The size of the helper delta file varies and it's based on how long the snapshot takes to delete. But it's generally small, because most snapshots are deleted in less than an hour.
Depending on your vSphere version, if you have multiple snapshots, you may need extra disk space when deleting multiple snapshots because of the way they are merged into the original disk file.
The process for deleting multiple snapshots has changed across vSphere versions. In older vSphere 4.0 versions and VMware Infrastructure 3 (VI3), if a VM has 3 snapshots active and you deleted them, the following process occurs: Snapshot 3 is copied to Snapshot 2, which is then copied to Snapshot 1. Next, Snapshot 1 is copied to the original disk file, and the helper snapshot is copied to the original disk file, as outlined below.
This process requires extra disk space because each snapshot grows as the previous snapshot is added to it. If there isn't sufficient free disk space on the data store, the snapshots cannot be committed.
In later vSphere 4.0 versions and vSphere 4.1, each snapshot is merged directly into the original disk, instead of merging with the previous snapshot. Figure 2 shows what happens when a VM has 3 snapshots active and you deleted them.
Because each snapshot is directly merged into the original one at a time, no extra disk space is needed, except for the helper file.
If you are using an older version of vSphere or VI3, an alternate method of deleting multiple snapshots that requires less additional disk space is to delete the snapshots one by one, starting with the snapshots farthest down the snapshot tree. This way, the snapshots grow individually when they are merged into the previous snapshot, and subsequently deleted. If a little more tedious, this method requires far less extra disk space.
Important: Don't run a Windows disk defragmentation while the VM has a snapshot running. Defragment operations change many disk blocks and can cause very rapid growth of snapshot files.
How long does it take to delete a snapshot?
When deleting snapshots through the vSphere Client, the task status bar can be misleading. Generally, the task status jumps to 95% complete fairly quickly, but you'll notice it will stay at 95% without changing until the entire commit process is completed. vCenter Server has a default 15-minute timeout for all tasks, which can be increased. Thus, even though your files are still committing, vCenter Server will report that the operation has timed out.
One method for finding out when a task completes is to look at the VM's directory using the Datastore Browser in the vSphere Client. When the delta files disappear you know that the snapshot deletion has completed. There is also a command-line method for ESX and ESXi that you can use to monitor the status of snapshot deletions.
Snapshots that have been active for a very long time (thereby becoming extremely large) can take a very long time to commit when deleted. The amount of time the snapshot takes to commit varies depending on the VM's activity level; it will commit faster if it is powered off. The amount of activity your host's disk subsystem is engaging also affects the time the snapshot takes to commit.
A 100 GB snapshot can take hours to merge into the original disk, which can affect VM and host performance. For this reason you should limit the length of time you keep snapshots and delete them as soon as you no longer need them.
Snapshots and metadata locks affect host performance
Snapshots have a negative impact on the performance of your host and virtual machines in several ways. When you first create a snapshot, your VM activity will pause briefly; if you ping a VM while creating a snapshot you will notice a few timeouts. Also, creating a snapshot causes metadata updates, which can cause SCSI reservations conflicts that briefly lock your LUN. As a result, the LUN will be available exclusively to a single host for a brief period of time.
If you've created a snapshot of a VM, and run the VM, the snapshot is active. If a snapshot is active, the performance of the VM will be degraded because the host writes to delta files differently and less efficiently than it does to standard VMDK files. Also, as the delta file grows by each 16 MB increment (discussed in part one of this series), it will cause another metadata lock. This can affect your VMs and hosts. How big an impact on performance this will have varies based on how busy your VM and hosts are. In part three of this series, I'll go into greater depth about troubleshooting VMware Snapshots to avoid such performance problems.
Finally, deleting/committing a snapshot also creates a metadata lock. In addition, the snapshot you are deleting can create greatly reduced performance on its VM while the delta files are being committed; this will be more noticeable if the VM is very busy. To avoid this problem, it's better to delete large/numerous snapshots during off-peak hours when the host server is less busy.
Never expand a disk file with a snapshot running
You should never expand a virtual disk while snapshots are active. You can expand disks using the vmkfstools --X command or the vSphere Client. In VI3, if you expand a disk using the VI Client, it reports that the task completes successfully. But it won't actually expand the disk file. And if you expand a virtual disk in VI3 with vmkfstools while a snapshot is active, the VM will no longer start, and you will receive an error: Cannot open the disk ".vmdk" or one of the snapshot disks it depends on. Reason: The parent virtual disk has been modified since the child was created. Fortunately, there is a way to recover from this scenario.
In vSphere, it is not possible to expand a VM's virtual disk while a snapshot is running, if you try the vmkfstools command, you receive an error: Failed to extend the disk. Failed to lock the file. With the vSphere Client, if you edit a VM's settings when a snapshot is running, and then select one of its virtual disks, the option to resize the disk is grayed out. But once the snapshot is deleted, you can resize the virtual disk.
If a VM has a Raw Device Mapping (RDM) disk attached, the disk (logical unit number) size is managed by the physical storage system and not by vSphere. As a result, you can increase the logical unit number size without involving vSphere, and it is possible to increase the size of an RDM disk while snapshots are active. But this action can corrupt the RDM disk, so always ensure that you delete snapshots before increasing the size of an RDM disk.
Excluding virtual disks from using snapshots
If you have a VM with more than one disk and you wish to exclude a disk from being included in a snapshot, you must edit the VM's settings by changing the disk mode to Independent (make sure you select Persistent). The independent setting provides you the means to control how each disk functions independently, there is no difference to the disk file or structure. Once a disk is Independent it will not be included in any snapshots.
Additionally, you will not be able to include memory snapshots on a VM that has independent disks. This is done to protect the independent disk in case you revert back to a previous snapshot with a memory state that may have an application running which was writing to the independent disk. Since the independent disk is not reverted when the other disks are it could potential corrupt data on it.
For VMs that have RDM disks, if the RDM was configured in physical compatibility mode, it will not be included in any VM snapshots. But if the RDM was configured in virtual compatibility mode, it will be included in snapshots.
Using snapshots to backup your VMs while they are running
Snapshots provide a great method to backup the raw VMDK files while the VM is powered on. All write operations are stopped on the original disk file, so it is safe to copy it to another storage volume, which is how backup applications, such as Veeam Backup & Replication and Quest vRanger, operate at the virtualization layer. They snapshot the VM, backup the disk file and then remove the snapshot when completed.
There are also some free user scripts and utilities which provide this functionality. These programs allow you to copy your VMDK files to local storage or to a network share to provide another recovery method for your important VMs.
The third and final part of this series discusses how to troubleshoot VMware snapshots.