VMware vCenter Site Recovery Manager (SRM) can be a useful tool in a disaster recovery (DR) plan for a virtual environment. It automates failover between data centers and disaster recovery sites and can test a failover plan without production environment disruption. Eliminating downtime can be more than just a nice-to-have for mission-critical environments: It ensures application uptime and eases the testing of recovery plans.
But if you try to deploy SRM out of the box without first considering a few key points, you may run into myriad problems. In fact, installing VMware SRM is usually the final stage of an SRM implementation, and you should deploy it only after you understand and address various issues. This tip outlines three key considerations: virtual machine (VM) placement, application dependencies and comprehensive disaster recovery planning.
With VMware SRM, It's not enough for VMs to reside on a SAN. For successful SRM deployment, where VMs are placed on a storage area network (SAN) is also important.
Why is VM placement important? First, VM placement drives SAN replication. VMware SRM relies on the presence of a supported SAN replication technology. VMware SRM doesn't manage or manipulate this technology; it just requires that it be present, properly configured and operating. Most SAN replication technologies replicate at the logical unit number (LUN) level, meaning that entire LUNs are replicated or not replicated. As a result, organizations must ensure VMs that require protection via VMware SRM are placed on a replicated LUN (otherwise SRM offers no protection). Some organizations may have begun to address VM placement when they first installed and configured a SAN replication, But if haven't, they will need to do so pior to a VMware SRM installation. Fortunately, you can use VMware's Storage VMotion technology to help migrate VMs among data stores without downtime.
Second, VM placement is important because VMware SRM operates by moving an entire LUN (or data store) at a time. VMs that should not be moved together during an SRM failover should not reside in the same data store. VMs can reside in the same data store only when it is acceptable for all VMs in a data store to fail over at the same time in the event of a disaster recovery event. Again, Storage VMotion becomes useful for moving VMs into appropriate data stores and without incurring downtime.
To address this consideration, organizations should document VMs' locations on a SAN. Once this documentation is complete, some VM migrations become obvious, such as the need to move a VM to a replicated LUN for protection via VMware SRM. Other necessary migrations may not emerge until later in the SRM implementation process. Having such documentation simplifies the migrations that emerge.
Application dependencies must be completely understood and documented. VMware SRM may be able to change the IP address of a protected VM, but it cannot address application dependencies. IT shops that seek to implement VMware SRM without understanding application interdependencies are doomed to fail.
Without a clear understanding of application dependencies, some VMs may be protected by VMware SRM while others that host services required by the protected VMs remain unprotected. So, for example, when a DR event occurs, protected VMs fail over to the designated failover site, but applications fail to run correctly because of missing dependencies. Alternatively, VMs may start up in the wrong order, with dependent applications attempting to start before underlying services required from other VMs are available. In both cases, knowing how applications interact with one another allows IT shops to craft the VMware SRM deployment appropriately to fix the dependencies.
Some application dependencies are more obvious. An organization, for example, wouldn't typically fail over application or middleware servers without also failing over the underlying database server. But more subtle dependencies go overlooked. Don't forget to consider nonvirtualized dependencies.
To address this consideration, organizations should comprehensively map application dependencies and interactions. With this dependency map in hand, organizations may discover that additional VM migrations are necessary to satisfy the first consideration. The organization may need to make changes to the SAN replication configuration. But at the least, the organization will be ready to create a disaster recovery plan with applications that launch in the correct order once VMware SRM is installed and configured.
Comprehensive disaster recover planning
While it may seem obvious, remember that VMware SRM handles only the virtualized portions of a data center. So you still need a solid disaster recovery plan for the rest of a physical data center. While VMware SRM provides functionality for integration with nonvirtualized resources -- the ability to launch a script to manipulate a piece of network equipment, for example -- VMware SRM should still be recognized for what it is: one part of a larger DR strategy. Organizations must still define DR events, such as what constitutes a qualifying failover event, and organizations must still define the various roles concerning who will handle what in the event of a disaster. VMware SRM doesn't replace those roles. Rather, VMware SRM needs organizations to have these definitions in place so that the technology can be molded to fit the DR strategy. An organization that seeks to use the technology as the strategy may find itself struggling to meet the success criteria of the project.
On the other hand, an organization that performs due diligence to address all these considerations will likely find that its implementation of VMware SRM runs well. When the time comes to install VMware SRM, virtualized IT shops will find that this is often the least complicated and least involved task in the project, as it should be.
ABOUT THE AUTHOR: Scott Lowe has worked in the technology field since 1994, and has
since held the roles of an instructor, technical trainer, server/network administrator, systems
engineer, IT manager, and CTO. For the last few years, Scott has worked as a senior systems
engineer with a reseller, providing technology solutions to enterprise customers. Scott also runs a
virtualization-centric weblog at blog.scottlowe.org.
This was first published in February 2009