A disaster recovery location off-premises that you only pay for when needed makes a lot of business sense. With nested virtualization, performing DR to the cloud can use standard DR products and enable all of the testing and process automation that are part of these products.
To enable cloud DR, the provider will need to add a few items to its service catalog and learn how to make these pieces work together. Tenants will need to understand the operational difference in using nested virtualization for DR and select a provider that provides the right guarantees.
How to get past infrastructure hurdles
One of the challenges is that DR products like VMware's Site Recovery Manager (SRM) expect that one company owns the virtualization platform at both the production site and the DR site. If the DR site is a cloud provider, this is not the case; the physical ESXi servers and vCenter belong to the cloud provider and the tenant just gets some VMs.
Cloud providers tightly guard their platform management infrastructure. Tenants never get access and sometimes aren't even told what the platform is or which tools the provider uses to manage the environment. Usually if a tenant wants to handle its own vSphere in a cloud provider's data center, the tenant must co-locate servers they own at the cloud provider or lease dedicated physical servers. This is a large commitment for a resource that will rarely, if ever, be used. It is cheaper to use VMs, including running ESXi in a VM and allow bursting up in capacity with physical ESXi servers when DR events or tests occur.
Make sure your provider offers nested ESXi instances
To enable Disaster Recovery as a Service (DRaaS) using Site Recovery Manager (SRM), the cloud provider needs to have a few features in its service catalog. The first is offering nested ESXi instances alongside the Windows and Linux instances they offer to tenants. By now, I hope that nested virtualization is a well-known quantity. The ESXi hypervisor can run itself as a VM and the virtualized ESXi server can then run its own VMs inside two layers of virtualization.
The key benefit with this option is that the physical ESXi server belongs to the cloud provider, while the virtual ESXi server can belong to the tenant. Alongside the virtualized ESXi server, the tenant also runs vCenter Server in a VM in the cloud along with an SRM server. These tenant management servers are used to manage the virtualized ESXi server. This means the tenant manages the whole vSphere and SRM environment without dedicated hardware inside the cloud provider's data center.
Get the keys to IP storage
The next piece is that the cloud provider must provide IP storage to the nested ESXi servers. For the DR to work well, the replicated VMs must reside on shared storage.
Tenants use vSphere Replication to copy VMs from the tenant's own production data center to the virtual ESXi servers at the cloud provider. The nested ESXi server's sole role is to be the destination for this replication; the duplicated VMs are stored on the cloud provider's IP storage.
The IP storage may also be provided by a virtualized storage appliance in VMs, making the IP storage just another instance type in the service catalog. If the tenant used the same virtual storage appliance at its production data center, then SRM could use array-based replication.
Use physical hosts to speed up DR
The final piece is that the cloud provider must have some spare physical virtualization hosts a tenant can use to test the SRM recovery plans or during a failover when a disaster occurs. The problem with the nested ESXi server being used as a platform for production VMs is that nested VMs cannot benefit from all of the hardware-offload performance benefits that are available to the physical ESXi server and its VMs.
Nested VMs run slower than you may want, so, when you fail over you will want to use a physical ESXi host. The provider must maintain a pool of ESXi hosts that are used for DRaaS; these will be dedicated for brief periods to specific tenants who are undertaking DR activities. For planned tests, the physical hosts would be booked in advance, but in a real disaster, there would be no time for booking.
The number of hosts in the pool compared to the number of tenants' production ESXi hosts should be a balance to have enough capacity for DR events without having to price the service out of reach.
Put together the final pieces of the DRaaS puzzle
To activate the DR plan either for testing or a real failover, there's an additional process to make the physical ESXi servers available:
- The tenant contacts the provider asking for capacity.
- The provider assigns some physical hosts to the tenant.
- The tenant adds the physical ESXi hosts to its DR vCenter and finalizes the configuration.
- The tenant enacts the recovery plan in SRM.