Virtualizing vCenter Server is a contentious topic, with sound arguments on why you should and shouldn’t. Ultimately, both sides disagree over which method can reduce vCenter Server’s downtime -- especially during an infrastructure outage.
VCenter Server is the central management hub for VMware infrastructures. It manages and monitors every host and virtual machine (VM). And numerous vSphere features require vCenter Server, including Fault Tolerance and Distributed Resource Scheduler (DRS) -- albeit some features, such as VMware High Availability (HA), can work while vCenter is offline.
When vCenter Server is down, the VMs and hosts will continue to run, but you lose critical visibility and certain features, such as vMotion. As a result, keeping vCenter Server online is a top priority for VMware administrators.
With the release of vSphere 5, VMware will offer vCenter Server as a prebuilt virtual appliance. But it doesn’t change the fundamental arguments in this face-off.
Virtualizing vCenter Server: The way to go
Virtualizing vCenter Server: Not worth the hassle
Virtualizing vCenter Server delivers all of the advantages of virtualization, such as high-availability and snapshot protection. At the end of the day, it’s just software and has no idea whether it’s running in a physical or virtual machine.
Initially, VMware recommended against virtualizing vCenter Server because of the performance limitations of its hypervisors. After all, ESX 2 limited VMs to just two virtual CPUs and 3 GB of RAM, which didn’t compare to the power of physical hardware at the time. Since vSphere 4, however, VMware now recommends virtualizing vCenter Server. And with vSphere 5, IT shops can deploy a vCenter virtual appliance.
Even back in 2003, I thought VMware’s recommendation was odd. If you really believe in virtualization, you must have the conviction to virtualize all VMware technologies, including the management layer. How can you convince application owners to virtualize tier-one services if you don’t have the guts to virtualize your own stuff?
Philosophical issues aside, there are also strong, technical explanations for virtualizing vCenter Server:
- You don’t need to buy a dedicated physical machine for vCenter Server and its other components, such as the back-end database.
- If you size vCenter incorrectly or your infrastructure grows, you’re a mouse click away from upgrading vCenter’s virtual hardware.
- You can use vSphere’s snapshot feature to capture a quick backup of vCenter Server and the back-end database. If something goes horribly wrong during an upgrade, you can just revert to the snapshot and restore vCenter to a working state.
- Because vCenter is encapsulated in a VM, you can use vSphere’s hot clone feature, a replication method that occurs with no downtime. You can also use virtualization backup tools to save a vCenter Server VM, making the backup process simpler and more consistent.
- By placing the vCenter Server VM in a Distributed Resource Scheduler/High Availability cluster, you can protect vCenter Server from performance bottlenecks and host outages.
In this configuration, vCenter Server has the protection of all the high-availability technologies, such as VMware HA and DRS Groups. Even if vCenter Server goes down, VMware HA still works. Therefore, you may not need the expensive vCenter Heartbeat to ensure that vCenter Server is up and running. (Although, VMware HA doesn’t completely protect vCenter Server from a service outage; the vCenter Server VM just restarts after a virtual host failure.)
In a virtual infrastructure, vCenter Server is generally very reliable, but the back-end database remains a single point of failure. Then again, the database is just as vulnerable on physical hardware. In fact, the complexities of building a physical machine, with all of the driver and firmware requirements, may make a physical server less reliable.
Still, my detractors will point to problems from personal experience. It pains me to say this, but those arguments say more about how they plan and manage a virtual infrastructure.
They will say that running a management system on the platform it manages is an inherently bad idea, and I totally agree. Most folks who virtualize vCenter Server place it in a cluster that’s managed by another instance of vCenter Server. For example, they’ll set up two clusters, each with a separate instance of vCenter Server. VCenter Server in Cluster A will manage the hosts in Cluster B, and the vCenter Server in Cluster B will manage Cluster A.
Detractors will decry that this method is expensive. But in an enterprise infrastructure with lots of management systems, it makes sense to protect the management layer.
Detractors also will tell horror stories of not finding vCenter Server after VMware HA restarts the VM on a random host. But there are plenty of ways around this problem. If you opt for a dedicated, two-node management cluster, locating the vCenter Server VM isn’t difficult.
Alternatively, if this configuration is too expensive, you can run vCenter Server on an existing VMware cluster and use DRS Groups to assign the vCenter VM to a smaller number of hosts. After a host failure, VMware HA will still honor the DRS Group rules, even if DRS and vCenter Server are offline.
I prefer running vCenter Server on a physical machine to keep things simple. After all, virtualizing vCenter Server creates unnecessary complications.
Remember, vCenter Server plays a critical role in monitoring hosts and keeping important virtualization features online. When vCenter Server is down, you’re essentially blind, with little information that can help you solve infrastructure problems.
In the event of an infrastructure problem, you need to know where vCenter Server is located to assess the situation. If you virtualize vCenter Server, DRS and HA can seriously complicate your search. DRS continually moves VMs around a cluster to load balance host workloads, and HA migrates VMs after a host failure. As such, it’s easy to lose track of vCenter Server’s physical location. If the vCenter Server VM goes down for whatever reason, you will need to connect to each host until you find it, which can be a time-consuming process -- especially if you have many hosts.
Another problem occurs during a service disruption, such as a storage area network failure. This event can affect every host in an infrastructure, as well as take down a virtualized vCenter Server VM. But smaller outages can also knock vCenter Server offline, such as a problem with vCenter’s virtual host or storage device.
There are also other caveats when virtualizing vCenter Server. For instance, you cannot cold-migrate a vCenter Server VM. There are situations where you cannot perform a vMotion or live migration, such as when a destination host’s hardware is incompatible with the VM. In these instances, you can move a VM through a cold migration, in which a VM is powered off before it’s moved to another host or data store. But this method requires vCenter Server, and you can’t cold-migrate vCenter Server if the VM is powered off.
Also, if you use vShield Zones/App, vCenter Server will lose network connectivity after migrating to an unprotected host. When you enable vShield Zones/App, vCenter Server inserts and removes network filters from VMs. If a VM moves to a host that isn’t protected by the Zones/App firewall, the network filter must be removed before the VM can regain network connectivity. But the vCenter Server VM can’t remove the filter from itself, so it will lose network connectivity if it migrates to an unprotected host.
By placing vCenter Server on a physical machine, I can avoid these common problems. I always know where vCenter Server resides, so I don’t need to play “Where’s Waldo?” to find it. When VCenter Server is on a physical host, it also doesn’t have any dependencies. In other words, I don’t need a host or storage device running before I start vCenter Server.
You could argue that a virtualized vCenter Server will automatically restart on another host with VMware HA. But most failures are typically caused by the hypervisor (e.g., the Purple Screen of Death error) and not the underlying server hardware. Additionally, a typical physical server has redundant power supplies and RAID configurations, so it’s rare for hardware failures to disable a host.