Manage Learn to apply best practices and optimize your operations.

Avoiding storage array snapshot pitfalls in a VMware environment

Storage array snapshots created from a VMware platform can be useless if you aren't aware of a few pitfalls that interfere with array snapshot consistency.

When used in conjunction with a VMware infrastructure, storage array-based snapshots are touted for their ability...

to create point-in-time pictures of virtual machines (VMs) for business continuity, disaster recovery and backups. While this can be true, it's important to understand how virtualization affects storage array snapshot use. Incorrect usage can render storage array snapshots unreliable and generally defunct.

Before we proceed, remember that the snapshots to which we refer are not VMware virtual machine snapshots, but rather the snapshots provided by the storage array. Because these snapshots are not, by default, integrated in any way with VMware ESX Server, we have to perform a few extra steps to ensure consistently reliable and usable storage array snapshots.

To ensure consistent snapshots, VM-level file system consistency is the key. It's important for users to understand that there are multiple levels of operations occurring simultaneously. When a VM issues a write to disk, it has to pass through the virtualization layer before it gets to the actual storage array. It's necessary to ensure that VM file system buffers are flushed and that host-level I/O buffers are flushed as well.

Using storage array snapshots with VMware
Looking at this from a VMware-specific perspective, (although other virtualization solutions suffer from this problem as well, to various degrees) there are a few ways to help ensure that storage array snapshots are consistent and usable.

The first way is to avoid the use of hot snapshots. There are three types of snapshots that we can create in a VMware environment:

  • Cold snapshots: Cold snapshots require the most downtime but provide the greatest guarantee of VM-level file system consistency. In a cold snapshot, you'll shut down the VM, take a snapshot, and then start the VM up again. All this can be scripted, but it still means downtime and you'll need to take that into consideration in your planning.

  • Warm snapshots: Warm snapshots require less downtime but will require a VM-level file system check (chkdsk or fsck) when the VM is recovered from a warm snapshot. In the majority of cases, NTFS or EXT3 file system journaling will prevent any data corruption. With a warm snapshot, the VM is paused or suspended while the snapshot is taken and resumed after the snapshot is complete. Scripts are generally used in these instances, and generally also invoke the use of the Sync driver, which helps to flush VM file system buffers to disk to help with file system consistency.

  • Hot snapshots: Hot snapshots require no downtime but run the greatest risk of inconsistent data. In a hot snapshot, the VM is live when the snapshot is taken. The guest OS has no warning to flush the I/O buffers, nor is the ESX host given time to commit writes to the storage array. This results in file system inconsistency in the guest OS, which will then force a file system check upon reboot. While the file system may recover, applications running in that guest OS such as databases and email systems may not recover, and data loss may result if the storage array snapshot is restored.

Problems with VM-level file system inconsistency can be greatly reduced by avoiding the use of hot snapshots. As described above, however, the use of cold and warm snapshots introduces varying degrees of downtime and service interruption. This may be unacceptable to the organization.

Combine VMware VM snapshots with storage array snapshots
Another way to help ensure that storage array snapshots are consistent is to combine storage array snapshots with VMware virtual machine snapshots. Initiated through and managed with VirtualCenter, these VM snapshots involve the use of a differencing disk to which all changes to the VM's file system are written. The base VMDK is quiesced and unlocked. As a result, storage array snapshots taken in conjunction with a VMware snapshot behave very much like the warm snapshots described above. The VM will initiate a file system check, but in these situations there is no service interruption or downtime involved.

Finally, the use of Raw Device Mappings (RDMs) may be another way to help with VM-level file system consistency. RDMs provide a way to bring raw LUNs into a virtualized environment. VMware recommends the use of RDMs when using "layered SAN applications" or in environments where SAN snapshots are used. Note that some SAN vendors only support SAN snapshots when using RDMs.

Using the information in this article in conjunction with the published best practices from your SAN vendor can help you ensure that the storage array-based snapshots of your VMs are actually usable in the event they are needed.

ABOUT THE AUTHOR: Scott Lowe is a senior engineer for ePlus Technology, Inc. He has a broad range of experience, specializing in enterprise technologies such as storage area networks, server virtualization, directory services, and interoperability. Previously he was President and CTO of Mercurion Systems, an IT consulting firm, and CTO of iO Systems.

Dig Deeper on Backing up VMware host servers and guest OSes