Manage Learn to apply best practices and optimize your operations.

Creating a disaster recovery plan for VMware View virtual desktops

Will your VMware View virtual desktops function properly if you lose your main data center in a disaster? Here's how to design a DR plan for a View architecture that works. Also: Will VMware SRM manage View desktops in the future?

Recently, I took a trip to VMware's corporate headquarters in the UK. Ostensibly, I was there to quiz their system...

engineers about VMware View 4.5 beta, which I'm currently writing a book about. I also knew that one of the system engineers for VMware Site Recovery Manager would be there, so the other item on my two-day agenda was investigating possible disaster recovery (DR) strategies for virtual desktops. As you might know I've written two books about VMware Site Recovery Manager (SRM) as well, and the two products are very close to my heart, so I was keen to learn more about their possible interaction.

I had another concern as my customers embark on the virtual desktop journey: What contingences could be put in place to protect virtual desktops from a potentially catastrophic loss of a data center?

As you know, centralization is a key part of a virtual desktop rollout, and it creates dependencies and risks. It's ironic that virtual desktops often form part of an organization's DR strategy. In the event of a disaster, they say to me, they would send the users home to connect to their virtual desktop remotely. But what if the business has lost the very data center that would allow that to happen?

Should you replicate virtual desktops?

It seems clear to all concerned that replicating virtual desktops from the protected site (the production location) to the recovery site (the DR location) is a waste of both bandwidth and disk space. Most virtual desktop solutions like VMware View have matured to such a degree that it is possible to create new virtual machines from templates or "linked clones" very rapidly. With that said, some care needs to be taken – the Input/Output Operations Per Second (IOPS) generated by deploying and running 500 virtual desktops is significantly different when you are looking at a 1,000 desktops. The last thing you would want would be serious I/O issues whilst the rest of the IT department scurries around making sure the business survives the disaster. It's a process that needs to be considered carefully as it could impact your Recovery Time Objective.

Once you are aware of this particular rubicon, the next step is to consider the infrastructure you need in order to recreate virtual desktops in the event of a disaster. To create new virtual desktops, you will need access to either the templates or VMware Views' "Linked Clone" or parent virtual machines (VMs) that make up a desktop pool.

If you store your templates/parent VMs on replicated storage then VMware SRM can facilitate the "fail over" of these ancillary files to the recovery site. In this respect, VMware SRM does add value to the DR process; it means that despite your template/parent VM being configured for VLAN10 in the protected site, when the DR plan is triggered it will be part of your SRM "Recovery Plan" to patch it to a new network at Recovery Site, say VLAN20. Without these ancillary files, your virtual desktop recovery plan would fail at the first hurdle, as you would lack the source files that create the virtual desktop the user needs to connect to.

Dedicated vs. floating virtual desktops

If you look at virtual desktops from a DR perspective, it may also affect your choice of the type of virtual desktop your end-users will use, and the experience they have. Most virtual desktop systems allow the administrator to offer the user either a "dedicated" desktop (one that they always return to and can customize) or a "floating" desktop (they never return to the same desktop, and the desktop may even be destroyed or reset after use).

Once we accept that replicating the files that make up a virtual desktop is not an appropriate strategy, then the idea of a dedicated desktop becomes difficult to justify. It merely adds a layer of complexity to the recovery plan; you'd have to include per-user settings and data. Knowing this, I will recommend that my customers construct the end-user environment in such a way that no per-user settings or data are ever saved on the virtual desktop. The more vanilla and non-user specific the virtual desktop the better -- but with one caveat.

There may be power users who do need a more customizable virtual desktop to which they return time and time again. Given the impact to the business on the inability of these individuals to operate, we might consider handling their virtual desktops differently by replicating them individually to a DR location. In most cases this shouldn't be necessary, as the data should be held on systems external to the virtual desktop, and it's the data that really counts in the world of disaster recovery -- that, and being able to access it efficiently.

The next issue concerns the infrastructure server required to make virtual desktops function. In the world of VMware View, this includes two main servers – the Connection Server and Security Server. The Connection Server acts as the broker role finding the user virtual desktop, whereas the security server assists in the traversal of the company's firewall.

At first glance it might seem that the easiest thing to do is to replicate these roles to the recovery site, but this is problematic for various reasons. Each connection's server is intimately embedded into the existing vCenter and vSphere environment. Hard paths exist to objects in vCenter in the protected site that are unique to it. Simply powering on the connection server at the DR location isn't really going to achieve anything in itself. All you will have is a connection server pointing to an orphaned vCenter (because it was lost during the disaster), and a collection of virtual desktop pools that are also broken.

Taking this into account, it seems more sensible to "mirror" the configuration of the protected site View servers by having a duplicate set of connection servers ready to run in the event of a disaster. The big down side of this is that every administrative action that is carried out in the protected site would also have to be carried out in the recovery site -- talk about an administrative burden; this would literally double the administration tasks involved. VMware SRM doesn't add much to automating the recovery of a VMware View environment at the moment, except for recovering your all-important templates and parent VMs.

Will VMware View and SRM integrate in the future?

Where could VMware View and VMware SRM go in the next couple of years to address these challenges? At the heart of VMware View sits Microsoft Active Directory Application Mode (ADAM, although recently renamed Microsoft Active Directory Lightweight Directory Service, or AD LDS). AD LDS/ADAM replicates every administrative task carried out on one Connection Server to a Connection Server "replica." Going forward, I think it is likely that we might see VMware build on this system to allow for a VMware SRM-like DR plan for VMware View.

It would work something like this: The VMware View servers would be in the same group, but split across two sites -- the protected and recovery site. A new View role would be created specifically for receiving updates from the View Servers in the protected site. In short, there would be a type of hot-standby View Server that would only be visible to the end-users when you ran your VMware SRM Recovery Plans.

In the meantime, if you have a vSphere 4/View 4 environment and you're concerned about disaster recovery for your virtual desktops, there are a number of articles by various storage vendors that detail disaster recovery blue prints. By far, the most intriguing is a blue print called " EMC Business Continuity for VMware View" written by EMC in August of 2009.

My predictions are mere speculation, but it seems to be the most logical direction to take given the issue of DR and virtual desktops. I don't personally see this level of automation reaching VMware customers within the time frame of vSphere 4 and View 4; we'd probably have to wait for vSphere 5 and View 5, but the wait may not be as long as some people, myself included, had previously imagined.


Mike Laverick (VCP) has been involved with the VMware community since 2003. Laverick is a VMware forum moderator and member of the London VMware User Group Steering Committee. Laverick is the owner and author of the virtualization website and blog RTFM Education, where he publishes free guides and utilities aimed at VMware ESX/VirtualCenter users, and has recently joined as an Editor at Large. In 2009, Laverick received the VMware vExpert award and helped found the Irish and Scottish VMware user groups. Laverick has had books published on VMware Virtual Infrastructure 3, VMware vSphere4 and VMware Site Recovery Manager.


Dig Deeper on VMware how-tos