A while ago I had Skype chat with Greg Mulholland – he's based in Melbourne, Australia where he is a senior infrastructure consultant specializing in VMware and Storage (NetApp primarily). Occasionally, I am approached on Twitter, LinkedIn and e-mail to answer questions and queries – and so long as I have the time I will sometimes jump on Skype to discuss it personally.
Greg's question centered around swap files both VMKernel and guest operating system swap. In my book on SRM 5.0 I do mention how replicating this redundant data may not be such a fantabulous idea. Whilst relocating the VMKernel swap file is pretty much a done deal, I'm not convinced that relocating the guest OS to non-replicated storage is altogether a good idea.
In my discussions with folks on Twitter it became clear there is other stuff that you might want to relocate. For example many databases such as Microsoft SQL and Oracle have "Temp DB" that take up space – that folks would rather not include in their replication cycle. So what started off as narrow view on swap files, could be applied to other "transient data" that you might move to different virtual disk in the guest OS - and not included in the replication process. Fred van Donk has rather interesting explanation of workarounds for this sort of issue. His blogpost is here – SQL, replication, and VMware SRM. How to replicate SQL and keep you network team happy. Essentially, Fred's work around is having datastore at the Recovery Site with the SAME name as on the Protected Site – which is pre-populated with VMDKs that match the non-replicated disk of the VM. When the VM is recovered it just mounts the VMDK at boot-up not realizing that its not the VMDK at the Protected Site, but at the Recovery Site.
VMKernel Swap Files & SRM…
Anyway, it didn't take long for Greg and I to both agree that the per-VM VMKernel swap file could be relocated to non-replicated shared storage thus saving potentially a lot of bandwidth. By default SRM destroys the original VMKernel swap file, and recreates it anyway. Remember there's no real saving in disk space. Fundamentally, when VM is powered on at the Recovery Site it will create a swap file of some description (unless you have set the memory reservation to = the limit…)
In case you don't know, the VMkernel swap file resided by default where the .VMX file is of the VM.
Note: Here's a VM with 4GB of RAM as the limit with no reservations. Resulting in 4GB swap file.
This default behavior can be over-ridden at the VMware Cluster level, and set to be different place. It's a two step procedure. You need to turn on the option on the cluster first…
Note: The warning about degraded vMotion performance really applies if you locate the swap file on non-shared stored such as DAS.
After doing that you then specify the datastore on a per ESX host basis. Personally, I find that kind of odd. You'd think you'd be able to do all from the cluster. You get there a bit round the houses by selecting the ESX host >> Configuration >> Virtual Machine Swapfile Location and Edit…
The size of this VMkernel swap file is govern by the relationship between two values the "limit" and the "reservation".
Fundamentally, the VMKernel must "guarantee" memory either in physical RAM or on disk – or combination of both. Remember, ESX isn't a dumb hypervisor it knows to just deliver the physical RAM "on-demand", and only use the swap file as a "last resort" when there is no physical memory left – it's a feature called "memory over-commit". If you don't touch the reservation value at all – the size of the VMKernel swap will be what ever you set as the limit of memory on the VM.
If the limit is 4GB minus a reservation of 0 = 4GB VMKernel Swap
If the limit is 4GB minus a reservation of 2GB = 2GB VMKernel Swap
If the limit is 4GB minus a reservation of 4GB = 0GB VMkernel Swap
[Believe it or not I once had a student who was adamant that 4-0 = 0. So if your struggling with this - if you have $4 and take no money away from you, you still have your $4]
The relationship between the two values is often thought as as see-saw. As the reservation goes UP, the amount of disk space needed for swap goes DOWN. As the reservation goes DOWN, the amount of free memory needed at power on goes UP. I imagine that few day-to-day customers are even aware of this relationship. They just click next, and barely notice the swap file at all. Except on the day they create a 64GB VM, and try to power it on it but find it can't because there's insufficient free disk space on the LUN/Volume for the 64GB swap file. Oops! Time for some VMware Training methinks… Don't call Ghostbusters, call Eric Sloof.
Guest OS Swap Files/Partitions and SRM
To be honest the VMKernel recommendation was pretty much a no-brainer. This got more sticky with the guest OS swap. As you probably know this file that normally resides on the C: in Windows, but is usually a partition in operating systems like Linux.
There are some advantages to relocating the swap file in Windows to a different virtual disk held on datastore.
- Saving on bandwidth – you replicate only the data stores that contain data you need. In my tests if the D: drive held the swap file (pagefile.sys) when its recovered – if the D: is absent Windows 2008 R2 recreates it on the C: drive.
- Improving the performance – by making sure the swap never competes for disk IO with the Operating System disk. Together with making the max and min values the same – you can boost swap file performance
- Improved backup – whilst many VM backup vendors are smart enough to ignore the swap file, some are not – relocating the swap file (as well as other transient data) offers an easy way of excluding the pagefile.sys from backup schedule.
There are some downsides as well:
- Loss of the incredibly useful and easy to interpret memory.dmp file caused by BSOD (I'm being ironic incidentally…)
- It's bit complicated to bake into a VM template when the files are split over many datastores…
- Not all OSes respond gracefully to the "loss" of the swap file/partition
- It doesn't work very well with SRM. Let me show you..
If you have VM with two disks with C: and P: drive – where you have relocated the pagefile to another location – in W2K8 it would look like this:
Incidentally this requires a reboot when you setup – and if you think getting to the right location to set this in ESX is convoluted try this on for size:
Right-click My computer >> Properties >> Advanced System Settings >> Performance Settings >> Advanced Tab >> Change button. PHEW!
So I set this up and tried to protect the VM. This is what I got:
As you can see the Protection Group has "Not Configured" warning – the cause is that "Device Not Found: Hard Disk 2″. Clicking the "Configure Protection" link opens the VM Protection Properties dialog. We can see the problem is that SRM can't protect something that isn't replicated [that's seems logical enough to me].
What would happen if SRM tolerated this condition – I was allowed to carry on regardless and just tested the Recovery Plan? Well, it would recover the VM, and at power on it would try to find the datastore on the ESX hosts at the Recovery Site. In my case I put the P: .VMDK on a datastore called "InfrastructureNYC". This isn't available to my Recovery Site hosts based in New Jersey – and I imagine it would not be available if there had been a disaster in New York either…
SRM does not tolerate this configuration. These VMs would not be protected by SRM, and when the Protection Group was created you would find these VMs would not have the relevant "placeholder VM" created at the Recovery Site until you detached the unreplicated device (notice the "Detach" button in the screen grab). The ONLY option (unless you use Fred van Donk's method) that works here, is selecting the device and clicking the "detach" button. This tells SRM not to both map that datastore into the .VMX file – so on upon recovery the VM powers on with just a C: drive, and in the case of W2K8 the swap file would be re-created on C:
That's fine. BUT. If I told you there was no bulk way of detaching the unreplicated device, and that's there no PowerCLI method to handle this – then you could see this could be an issue. If you had a lot of VMs to protect that would be quite a bit of administration. Selecting each one that's affected and detaching the device – and also remembering to do this for each new VM that was created. It does make you wonder why doesn't VMware just give you the alarm – but automagically do the "detach" process for you. You can't recover what isn't replicated after all.
So what's the official line? Well for some reason the new SRM 5.0 Admin guide doesn't offer any advice either way. However, the older 4.1 guide did have this statement:
The next question is – well is there any workarounds – the answer is yes but they involve some significant legwork. I think its a matter of opinion whether you consider them worth the effort. If you use these workarounds the warning message on the VM which states "Device Not Found: Hard Disk 2″ would remain. Can you live with that? Can you live with the extra administrator the workaround creates? To be honest I think its a case of balancing the legwork of administrator against the benefits it generates.
OK. Suppose you didn't do any fancy workarounds – and you stick with "detaching" the non-replicated disk. There's also another issue to consider. Failback after a planned migration or disaster recovery. Let say you do decide to relocate the Windows swapfile, and you use the "detach" option. When DR strikes the VM gets "moved" from the Protected Site to the Recovery Site. Now the VM in the Recovery Site is the real VM that end-user are connecting to…
There's a couple of consequences there. The pagefile by default would be recreated on the C drive. It means it is no longer EXACTLY the same as the VM that was lost when the Protected Site was lost. The configuration has now subtly changed – not massively but subtly. Does that configuration "work" for you given the benefits/reasons for relocating the swap file in the first place?
There's a bigger consequence. When you "failback" you would be replicating back to the Protected Site a VM that would now have a swap file on C:. So on regular replication traffic you would save on bandwidth – but on failback traffic you would incur the penalty of replicating the swap file. Unless of course, you reconfigured every VM that was failed over to use an additional virtual disk again, and relocated the swap file (which requires a reboot). No, I didn't think you would fancy doing that would you?
So you go ahead with the failback process (assuming that your Protected Site is in such a state that doesn't need a complete rebuild…). You would have a LUN full of VMDK that held the old swap files. If you wanted your original configuration back, these would be needed to be added back into each VM affected, and the guest OS would have be again reconfigured for swap being on different drive.
Given the "Not Configured" error in SRM and the hassle it generates when you "detach" the VMDK – AND the issue associated with genuine failovers and failbacks…
I think I would recommend to my customers to leave the swap file/partition well alone. When it comes to DR – I want as much as possible to live in a one-configuration-fits-all world to keep complexity down to a minimum. At the end of the day, if I'm in a DR situation. The last thing I want to worry about is swap files. I've got bigger fish to fry…
As for the TempDB space issue. I'm no DB guru and I would bow to those folks better knowledge – but I wouldn't like to be the guy charged with the workaround. The thing I want to avoid is going up against my Application Owners perceived best practice – that's just going to create a lot unwanted politics – and they can be tricky customers at the best of times. Part of me thinks I will "go the extra mile" for them, and ensure that when the VM comes up on the recovery site the drive letter is there for their TempDB system. I'd be banking on these instances being the exception to my rule than the common configuration. With that said, if I had an environment with 6K+ VMs how many SQL instances would I have to manage on a case-by-case basis?
Of course it could be the case that in the future VMware might come along and fix this problem for good. But how? If you know View, VMware VDI product – you might know it has the concept of the "disposable disk". This is used as dumping ground for temporary files such as the Internet Cache and Swapfiles. Now imagine that concept extended to SRM. The ability to mark a disk as being "disposable" and not needed at the DR location – or recreated at the DR location when needed…