Anyone who has ever taken a VMware Certified Professional exam knows that the vSphere maximums are something you...
need to have memorized. And then after the test, those same maximums are sometimes forgotten. Maximums are often seen as something that can be discarded, but a lot of people don't realize it's entirely possible to hit the vSphere maximums in a large environment with relative ease.
Those maximums and design limits are there for a reason, and should not be viewed as a target to achieve. Hitting maximums will, without a doubt, be detrimental to your environment and degrade the performance of your VMware estate.
While small scale operations with several hosts and a few hundred virtual machines (VMs) will likely never approach vSphere maximums. On the other hand when you are dealing with estates of hundreds of hosts and thousands of VMs, administrators have to be mindful of the maximums.
Typical limit "hits"
In larger environments, a prime example of reaching a maximum involves the number of supported data stores per host. This is an example of a limit that is unlikely to change because it is a hardware limitation that cannot easily be programmed around. As the cluster grows, administrators naturally add more and more data store storage to the cluster to keep up with demand. Meanwhile, more physical servers get added and before you know it, you are approaching the limit of 256 data stores for the host.
Once in this situation, it can be fixed but it can be awkward and somewhat time consuming. An administrator would need to ultimately reduce the number of data stores in service within the cluster just to be able to effectively manage the current estate. Reducing the number of data stores can be done by migrating VMs off that specific data store until it is empty. At this point, that data store would be removed and replaced with a larger sized one. For example, removing a 1 TB data store and replacing it with a 2 TB sized one.
The VMs would then need to be migrated back, automatically or otherwise. Prior to VMFS 5, using this method had the somewhat undesired side effect of the larger the data store: the larger the block size. This approach means a lot of manual effort and also comes with an obvious downsize of more IO activity due to the increased number of VMs per data store.
Until recently VMware View had a limit of eight hosts per cluster and many administrators found out about this when they came to add that ninth machine or request help from VMware for what is an unsupported configuration.
In some ways the size of maximums presented these days are, at least in my opinion, hypervisor companies looking to get a one up one the others. As a casing point, anyone who attempts to put 2048 VMs onto a single host deserves what will inevitably happen. Resource limits on anything but the most extreme hardware would be reached well ahead of the advertised figures.
A proper virtualization design
One way to avoid approaching vSphere maximums is by having an effective virtualization design. When it comes to virtualization design there are effectively two schools of thought: carefully considered and the rack and stack approach. This is part of the reason behind VMware much maligned VRAM tax. With huge amounts of RAM and the core counts in a physical socket, it was easy to get in excess of 50 VMs per host.
Working out the most efficient hardware platform will inevitably come down to a relatively simple formula of numbers of machines pitched against cost of downtime or performance.
Often the rack and stack mentality around virtualization negatively impacts performance but the bigger issue is what happens when a physical host fails. The task of trying to restart potentially hundreds of machines at once will mean bigger periods of downtime whilst those affected guests restart on new hosts. Also as the disk I/O will be saturated. The speed of restarts will be reduced further.
Negating or remediating the effects
The best way of negating these issues can be broken down into two separate items boils down to a well thought out design and choosing the appropriate hardware for the task. Rack and stack will top out at about eighty machines per server (given a decent server) but a better way to do approach the design is to have multiple smaller clusters. Designing this way produces a number of benefits.
First, reaching vSphere maximums becomes much less likely but also the fact that should a host within a smaller cluster fail, assuming it has N+1 capability, will mean the affected VMs will restart that much quicker. It also means that any cluster-wide issues are restricted to the one cluster. Secondly, it can marginally increase the maintenance overhead for management purposes.
The downside is that the licensing cost will be higher because of fewer machines per host socket.
At the end of the day, administrators should to avoid approaching minimums where possible by intelligent design decisions and looking at the cost of potential failure above and beyond the cost of the additional hardware.