In today's uncertain economic climate, an IT administrator may not have the disaster recovery budget he hopes for. But that doesn't mean he has to sacrifice using virtualization and other technologies to streamline an outdated DR plan. In this tip, I explain how to use a laptop, VMware Workstation, VMware Player and external storage to reduce a six-plus hour disaster recovery drill time down to less than an hour.
Raise your hand if you love disasters
Realistically, who wants to think about going back to work after a tornado or hurricane has turned your office into a pile of post-apocalyptic rubble? Like it or not, though, disaster recovery (DR) and business continuity planning is a necessary part of business -- or at least, it should be.
Disastrous events such as 9/11 and Hurricane Katrina have proven the old adage that those who fail to plan plan to fail; during these kinds of disasters, if you didn't have a strong disaster recovery plan in place, you likely saw why having one should be a nonnegotiable. That said, using virtualization technology can aid the speed and ease with which we can restore business operations.
In an ideal world, we would all have redundant data centers with fiber-optic networking that supports our continuous data protection environment across redundant storage area networks (SANs), and perhaps even have VMware Site Recovery Manager. Of course, it would also be nice to have a 42-inch monitor for browsing SearchVMware.com and YouTube -- er -- proactively managing our IT environments, but such is probably not the case, at least not in my world.
Today, many IT administrators have to do much more with much less. This requires that we try different approaches to common tasks. Even if our current disaster recovery plan is tested and proven, it should be re-evaluated in light of today's available technologies, especially if it's slow and laborious. This includes learning how to better use existing VMware tools for disaster recovery.
It's important to distinguish between a real disaster recovery scenario, and the faux disaster simulations that we may periodically simulate during the year with varying degrees of preparation. In a real disaster, IT administrators commonly work around the clock, drink venti mochas for energy and inhale PowerBars for nourishment until everything is up and running. In a disaster recovery test, however, we can be more relaxed, though certain criteria, such as time to recovery objectives, may be similar.
Disaster Recovery 101, or maybe 0.01
Until recently, a typical disaster recovery plan called for rebuilding certain critical servers from scratch using the hardware that had been contracted for this purpose and software that was stored at the off-site office recovery facility. Following the server build, an IT admin would restore data from media stored off-site.
This seemingly straightforward approach has its challenges. Over the years, I have been involved in nearly a dozen disaster recovery simulations, and the following is a list of problems I have encountered.
Using server hardware that is not included on VMware's Hardware Compatibility List (HCL) for the software I needed to install.
Tape drives that cannot read the provided backup tapes. (Ever notice how all those shoebox-looking digital linear tape units look awfully similar but don't work the same way?)
Missing drivers. It is amazing how much time can be wasted searching for one array controller driver.
Bad media. Media that had been tested and verified prior to being stored off-site can become unreadable.
Insufficient or nonexistent recovery instructions for building unfamiliar systems. (Building and configuring a system that I had no documentation for while people look over my shoulder is not an experience I care to repeat. Make that a triple venti, barista.)
- Mononucleosis. (Yes, that was my own problem, but it affected my ability to follow detailed server build instructions during the recovery exercise, and it could happen to you.)
Reducing costs and time spent
So then, what are some ways to avoid having to run the gauntlet during DR tests with the full knowledge that a real disaster would not only be far more stressful but also involve other unknown complications while at the same time minimizing capital expenditure?
Here is where VMware's virtualization tools make our lives easier. VMware Site Recovery Manager (SRM) is an ideal solution to many disaster recovery issues (though it doesn't have a cure for mono). Its ability to execute disaster plans in real time and not undermine the production environment during the testing process solves many of my current problems in planning for worst-case scenarios. Unfortunately, however, this product is beyond my current budget. So my IT team and I had to devise a less elegant but cheaper solution.
We have employed the following plan for our previous two disaster recovery tests. While it may lack the slickness of SRM and doesn't offer real-time data protection, it greatly improves the speed with which I can bring servers online.
As you have no doubt observed, storage has gotten incredibly cheap over the past couple years. The first time I saw a server that could store a terabyte of data was circa 1994, and it was used for storing data from the Space Shuttle (or so I was told). A couple of weeks ago, I purchased a 1 TB external USB hard drive for about $120. This cheap storage is the means by which I have improved our disaster recovery plan.
The basics of our improved DR plan
Here is where the beauty of cheap storage and inexpensive laptops comes into play. Using VMware Workstation 6 on a laptop, we created various virtual machines (VMs) that offered the OS and application functionality that we needed in any disaster recovery situation. These VMs could then be stored directly on the laptop's local storage or copied to an USB external hard drives. Remember that these server VMs do not contain real-time data ; they are used to save time in loading an operating system, configuring security, installing applications and so on.
This initial build process would often take six hours or more depending on the number of servers we required. The external USB drive also contains the installation files for the free VMPlayer application in case we needed to run VMs from a different host than the laptop.
Bringing servers online became a simple matter during the DR test. Working our telecom unit, we designed an IP addressing scheme on-site that does not conflict with the production systems running back at the office. Next, we powered on the laptop with VMware Workstation or installed VMware Player on an appropriate machine and plugged in the external USB drive containing the server VMs.
Bringing up base servers was simply a matter of starting up the VMs from either local storage or a USB drive, assigning IP addresses and appropriate names. Finally, we began restoring data and testing functionality to meet recovery objectives. Using this process, the time required to bring basic servers online for the DR test was reduced from six-plus hours to less than an hour.
Again, it is not the most elegant system, but using a home-made disaster recovery strategy with VMware Workstation can be done for very little money, usually using components you have lying around. It sure beats trying to figure out why the array controller doesn't work on the hardware you have staring back at you with a sneering look on its little display as time continues to ebb away. Further, it buys you time to get other employees up and running with their desktop applications while allowing you to time-stage additional infrastructure in case of a real disaster.
Don't knock it till you try it
For some, this approach may seem unvarnished compared with the sophisticated redundant disaster recovery systems currently available. But I view it not as a means to an end but as a stepping stone for moving forward. Using virtual servers for our disaster recovery plan has demonstrated the advantages of using new tools to question and redesign legacy processes. That paradigm shift alone will serve us well in how we approach business continuity in the future.
Is this the final chapter to my improvements of our DR testing? Absolutely not; right now I am investigating SAN-to-SAN replication as well as data deduplication and some of VMware's other disaster recovery tools. The goal is ultimately to have our DR plan integrated so seamlessly into our environment that keeping up with it is nearly automatic.
DISCLAIMER: The information, views, and opinions expressed in this article are solely those of the author and do not represent the research, views or opinions of NYCE Payments Network LLC. NYCE Payments Network LLC is not responsible and cannot be held accountable for any information, views, or opinions expressed in this article. The author takes full responsibility for the information, views, and opinions presented.
ABOUT THE AUTHOR: Mak King (VCP, MCP) is a senior business systems administrator and systems analyst at NYCE payments network. He has worked in IT for 14 years.