kantver - Fotolia


How to avoid a zombie VM infestation

Prevent your environment from turning into the virtualization version of "The Walking Dead" by removing VMs after finishing a project.

Since creating a virtual machine takes just a few minutes, it's easy to produce multiple VMs to test changes or try new products. That's one of the big benefits of virtualization.

Each VM uses a small amount of the infrastructure's resources. But if we leave all these test and trial VMs in place, they will eventually eat up all of those available resources. To avoid this situation, it is important to only keep VMs providing value to the organization. Left forgotten, a zombie VM will roam the virtualization platform, taking a bite out of available CPU cycles and swallowing RAM.

Get a grip on those VMs

With most virtualization platforms, one of the key factors is the ability to easily create VMs. New VMs can be quickly spun up to test ideas and upgrades, or simply respond on the fly to changing business needs.

In this "Wild West" environment, many VMs get provisioned with little control and few records. Then, some months later, the virtualization platform groans under the load of many more VMs than were anticipated. Resources become scarce and the help desk overflows with application performance tickets. The problem is not the planned production VMs; but rather all of the unplanned  ones.

Instead of having to focus on removing these zombie VMs, preventing them from happening in the first place is a far better option.

At the core of the matter is proper lifecycle management of VMs, so an administrator could set up a plan for VMs -- from birth to the grave. Most non-production VMs should not have a long lifespan and should only live as long as they are needed.

Make a record

To avoid a zombie VM situation, we need to capture a VM's purpose and lifespan when it is created. At a minimum, we should capture the project requiring the VM. Obtaining a VM and application owner name, VM lifespan and allowed outage window are also great ways to make an operations team's life easier.

There are a lot of places in which to store this information, including a common choice: the Configuration Management Database (CMDB). Another simple way to record the information is by using the notes on the VM. The notes may not be very structured or controlled, but they are generally right there on the VM. No matter how it's done, capturing the VM information needs to be part of the provisioning process. Trying to discover the VM owner weeks or months later is much harder than taking a little time up front to enter the information.

Production VMs tend to be long lived. They may remain through upgrades of its operating system, the hypervisor and the underlying physical resources. This is the immortality of VMs in action. As is typical, they will be owned by a business unit and have good change management.

Business units and support teams care very much about these VMs. They have little chance of being zombie VMs unless they are superseded and left in place. But users should still gather information that makes the operations team's life easier, as they can spend a lot of time looking after these VMs -- often for years -- and a little up-front effort to gather information will pay dividends. In short, these production VMs are the ones we must protect from turning into zombies.

Set up space for zombies

Another approach is to have a separate vSphere cluster, which is used for non-production VMs. This structure acts to separate the production and zombie worlds. The "zombie land" could comprise either a cluster or a resource pool. If you choose to use a resource pool, make sure to set CPU and RAM limits for it. Creating VMs on the "zombie land" cluster requires less strict change management, but any VM still there after a set time -- for example, three months -- will be deleted. If you know the cluster or resource pool doesn't contain any production VMs, then deleting the potential zombies becomes much easier. But regardless, it is crucial to destroy the old VMs in the "zombie land" cluster so the team understands not to put production VMs there.

You need to know whether a VM on your virtualization platform is returning value for the resources it consumes, so make it easier for the operations team to identify valueless zombie VMs and you will ensure best return on investment in the virtualization platform. Failure to capture this lifecycle information will create the risk of being overrun with zombie VMs.

Next Steps

How to fight against zombie VMs

VM data protection considerations

Dig Deeper on Creating and upgrading VMware servers and VMs