When a vSphere administrator wants to put more virtual machines on an ESXi server, the amount of RAM tends to be...
more of a limit rather than the CPU.
We fill every DIMM slot with the most cost-effective DIMM and then we buy a new ESXi host when we run out of RAM. When the host runs out of RAM, the VMkernel has a few ways to reclaim physical RAM from a VM. One of the options is ballooning.
A little memory ballooning is not unusual and is seldom a performance problem for a VM. As more RAM is reclaimed with ballooning, performance will degrade further. If ballooning is leading to high swapping, then there may be problems with performance.
Balloon driver in action
The balloon driver is part of VMware Tools, the package of drivers and utilities that make VMs run better. The purpose of the balloon driver is to take physical RAM from a VM and release it back to the VMkernel. This reclaim usually happens when the ESXi server is short of RAM and this specific VM is the loser in the competition for physical RAM. Ballooning makes memory stress visible inside the guest.
The VMkernel instructs the balloon driver to inflate by a certain amount of RAM. The balloon driver requests RAM from the guest OS and the guest allocates memory pages to the balloon driver. The balloon driver notifies the VMkernel that the pages no longer hold VM data and puts the physical pages that used to back these VM pages into its free list. The VMkernel is no longer backing the virtual page, since it knows the VM has nothing in the page.
Impact of ballooning
When the balloon driver first inflates, the guest will allocate pages that are either empty or idle. Empty pages have never had data, while idle pages have had data written but the data is no longer needed. In both these cases, the pages could be taken from the VM without an impact on the VM's performance. A little bit of memory ballooning on a VM will not cause any performance issues. If a little ballooning frees enough physical pages there will be no need for more RAM reclaim.
Once the empty and idle pages are given over to the balloon driver, the guest will see its free memory is close to zero. Any more RAM reclaim is likely to start causing performance degradation. The rate of this degradation depends on the operating system and applications inside the VM. Most operating systems will use some RAM as a disk cache to speed up access to frequently accessed disk blocks. This is the next bit of RAM that the guest will give to the balloon driver, causing disk performance to drop. Some applications are very sensitive to disk performance, others less so.
If there is not enough RAM reclaim, the balloon driver will demand more RAM. The next thing the guest will be forced to do is to put some of its memory pages onto its hard disk to release RAM for the balloon driver, which is called "guest swapping". The guest chooses what pages to put on its disk to release RAM, which is then given to the balloon driver. Again, different applications have different tolerance levels to the amount of RAM being swapped.
To determine any impact on the application, look at memory information inside the VM. If you see a high rate of swap reads then ballooning is causing a performance reduction. If a lot of RAM must be reclaimed by ballooning, then the cache and active pages may be impacted. The more RAM that is reclaimed, the bigger the performance impact.
Release the pressure
Keep in mind that ballooning is the symptom of a problem, not the cause. If there is pressure on RAM in your ESXi servers, then ease the pressure. I have seen suggestions to not install the balloon driver because it causes guest OS swapping, but that's a bad idea. If the balloon driver is not available, the VMkernel must use its own swapping to reclaim RAM from a VM. VMkernel swapping has a bigger performance impact than ballooning. Use a reservation to prevent a VM from having RAM reclaimed because a reservation guarantees physical resources. Reserved RAM will not be reclaimed from a VM. On the other hand you cannot reserve more RAM than you have. If you need to overcommit then expect to see occasional ballooning.
Find out how you can benefit from VM performance benchmarking
Hunt down and resolve VM performance issues