Disaster recovery and backup within a new VMware infrastructure is often the last thing to be addressed when setting...
up the environment. It shouldn't be. There is no excuse for a business of any reasonable size not to do it. How much would six hours of downtime cost your company? A lot more than a reasonable DR setup for sure.
Today, with modern virtualization stack, the ability to set up, run and use disaster recovery (DR) sites is straightforward and inexpensive. Failover and failback of entire sites can be done with a few clicks of the mouse. Failover has become a trivial operation on the surface of it with products such as VMware's Site Recovery Manager (SRM), and offerings from Zerto and Veeam.
Backup and DR are different beasts, but they are both vitally important. Backup is ensuring that you have secure and consistent off-site copies of all critical data. These backups should be verified frequently with test restores to ensure data integrity and data availability.
Disaster recovery on the other hand, is a well-thought-out, frequently tested and always available failover option. In a virtual world setting, DR is much easier to use than it once was and significantly less expensive. There is wide variety of DR options, but they are split into two general types: hardware-based replication and software-based replication.
Hardware-based DR products, such as SRM coupled with hardware-based replicated LUNs, deal with replicating entire LUNs rather than individual virtual machines (VMs). This was once the only way to do failover in a VMware environment. The complexity of configuring SRM requires a significant amount of configuration and the use of replicated LUNs between production and DR sites. Supporting this technology is complex and expensive. The high cost comes from needing multiple licenses to cover both the SAN and the VMware environment. To support the infrastructure, a very experienced administrator, who not only understands VMware but also the complexity of SAN management and raw-device mapping replication, would be required.
The other issue with SRM-type products is they lack fine-grain control over what is failed over. When the LUN is replicated in its entirety, unwanted VMs may be replicated and wanted VMs may be missed if they are not on the correct LUN or the entire cluster is not replicated.
Compare and contrast this with software-based replication from the likes of Veeam and Zerto. To be frank, these offerings are much easier to use. They allow configuration on a much more granular basis, meaning you can pick and choose a single VM or group of VMs to be enabled to failover rather than an entire LUNs worth. Contrary to hardware-based replication, there is no complex storage array management or replication licenses required with software-based replication.
Software-based DR is quite simple in its design and implementation. Most of the offerings revolve around a standard VM management tool style with an agent on each host and a virtual server to control the replication flow between both sites. The management server fronts a GUI that is used to manage when and what is replicated and other critical information. Although Veeam and Zerto differ in some respects, they both use a system based around this virtualized DR setup scenarios.
Key factors to consider
Before anyone goes out and makes a purchase, the IT department needs to make a few decisions. One of the main factors in disaster recovery are Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO is how long you can go before the system needs to be running on the DR side. The RPO refers to how up to date the failed-over information needs to be. If every piece of data is needed, such as with a financial-based company, an RTO of 15 minutes is usually the norm. This would require a lot of resources to ensure consistency and correct replication. The more current the information needs to be, the higher the number of resources required -- and hence cost -- to provide the RTO.
But RPO and RTO are more business decisions than IT. They do however provide a starting point from which programs can be designed and built out. However, there may be other issues that will force you toward a software-based offering, such as the fact that your current storage product doesn't do replication or the replication overhead or cost of doing replication proves prohibitive. This is where software-based storage can come into its own.
Zerto's and Veeam's software-based offerings can function in any type of environment and also have no expensive storage licenses. Software-based replication setups are provider agnostic and ideal for all levels of DR strategy. Trying either of these products is straightforward enough and the additional physical and virtual infrastructure is quite inexpensive. A three-node cluster could be made DR capable for about $4,000 quite easily, excluding hardware requirements. This also means no expensive proprietary hardware or SAN-side replication licensing.
Software-based DR products are a safe bet, especially if your team is not storage trained. Storage mistakes could exacerbate the situation. Some of the world's biggest companies, including many Fortune 500 ones, are moving towards a virtual DR setup and moving away from expensive SAN-based systems and saving a small fortune in the process.
How to ensure reliability
One thing to watch for is the potential interdependency between VMs. An example of this would be a classic three-tier hosted application. To effectively invoke DR, you would need all the VMs to be in a consistent state. This is on top of other dependencies, such as authentication and management infrastructure required.
The problem of consistency is dealt with by a technology known as consistency groups. Consistency groups, as the name implies, group together VMs that need to be kept together to avoid potential corruption and ensure consistency. It either fully completes or fully fails, thereby ensuring that the group of guests is in a consistent state.
What's involved in shaping the ideal VMware SRM DR setup?