Fellow vExpert, Gabrie Van Zanten (of gabriesvirtualworld.com) asked me a while ago for my view on the current...
state of VMware Snapshots and SRM. Gabrie was concerned by the support statement in the official SRM administration guide:
So in a nutshell. Snapshots are supported with array-based replication (ABR), but not for vSphere Replication (VR). The big limitation is that for the snapshot to useable – the CPU specification at the Protected Site and Recovery Site must be same, and meet the requirements surrounding VMotion.
Now you might think this is a curious limitation, given the fact that VMs are powered on at the Recovery Site – and one of the key things to remember is a VM can be cold migrated to one server to another (AMD to Intel for instance) without any CPU limitation… I believe the reason for the limitation is when you take a snapshot, you don’t just take a delta of the virtual disk, but you also take a snapshot of the memory state, and this state includes the CPU instructions… Such that if you revert a snapshot the VM can still send CPU instructions to known CPU.
I would be tempted to dub this sort of support as “qualified support”. The feature is supported within the constraints of certain criteria.
Being the curious type, I tested VR with snapshots. You don’t get any error messages… But when the VM is recovered, there is no snapshot data there. However, in given the difference between how VR handles snapshots compared to ABR, and wouldn’t be surprised to see VMware close this gap at some stage.
Next I tried this with ABR, as the admin guide indicates the snapshot is there. For the hell of it I tried to revert the snapshot to a previous state. That worked like a charm. So I’m confident that VMware Snapshots and ABR work together nicely.
Of course this situation is not without consequences for virtual machine backups. I had a chat with a couple of players in the industry to get a feel for what would happen. It sounds like the backup system in the Protected Site “owns” the VMware Snapshots it creates. Unless that system is recovered to the DR location you could have orphaned snapshots at the Recovery Site. That’s if you were very unlucky – that the event that the triggered the use of the SRM Recovery Plan occurred at the same time as the backup window – and that the replication cycle had included these “backup triggered” snapshots. It sounds like the best strategy is to test your restore strategy as if you’d had a DR event, and didn’t have SRM. After all backup/restore could be regarded by some as a legit approach to a DR event assuming the restore time meets your RTO. I would also consult with your backup vendor and quiz them about their recommendations for using their backup technologies in a DR situation. I think what you will want to avoid is backing up the newly created “recovery VMs” all over again as that would take sometime to complete. Finally, you could use a PowerCLI script to find these orphaned snapshots, and then commit them. Heck, that could even be included in your recovery plan…