Over the past few years, I have worked with a number of customers to inspect the health of their VMware environment....
This work involves checking for errors, performance, misconfigurations and items that do not follow VMware best practices. Just because there are best practices for some items does not mean that's the only way to configure an item. It simply means, for common applications, this is the suggested method.
I always take into account a customer's requirements and constraints before lecturing them on VMware best practices. The following items are worth checking in your environment to provide a more stable VMware infrastructure.
Fixing network issues
Within the network of your VMware environment, there are a number of items that can be configured incorrectly. I wanted to touch on a few that are found in many environments. These misconfigurations can result in lower availability or reduced performance.
VSwitches: Within the vSwitches in a VMware environment there are port groups that allow for the configuration of different networks or VLANs. If you are using virtual standard switches (VSS) there is a higher chance of errors since the configuration must be performed on every host separately unless you script it.
Port groups: One of the most common errors is naming port groups inconsistently or making a slight error in the port group name between hosts in a cluster. This can result in lower availability or a failed vMotion.
VLAN tagging: Another common issue is inconsistent VLAN tagging between port groups on hosts within a cluster. This can result in loss of network connectivity of a VM when moved to the host with the issue.
Checking those uplinks
The uplinks are an important part of vSphere networking on hosts and also a commonly misconfigured item. With uplinks there are a number of things that can go wrong, I'm only going to cover a few of the most widely missed items.
Inconsistent uplinks: The first is an inconsistent number of uplinks per host or vSwitch. This results in some hosts having less uplinks than others, which can affect availability and performance.
Bad speed and duplex settings: The second common issue with uplinks is improperly configured speed and duplex settings. You should understand what the recommendations from your switch provider are or the standard set by your network team -- then make sure all uplinks are configured to match. I commonly see hosts within a cluster that have some uplinks auto-configured, while others are statically configured. If these do not match, you can have inconsistent performance between hosts.
Correcting storage problems
Much like networking, there are a bunch of settings that can go wrong with storage configurations that can negatively affect your vSphere install. The storage layer is an important part and can result in lower availability and performance if things are misconfigured.
Proper number of storage paths: The paths to storage devices play an important part. You need to configure enough paths to allow for the level of redundancy that you require. A common number of storage paths should be configured for all vSphere hosts in your environment. If you must use different path configurations, at minimum, you must configure hosts within a cluster the same way.
Storage multi-pathing policy: The storage multi-pathing policy is just as important as the storage paths. By default, vSphere does its best to apply a default pathing policy for your storage array. VMware has a number of available policies built into vSphere, and there are default claim rules for major storage vendors.
Before putting your install into production, check with your storage vendor and make sure the proper multi-path policy is configured. Some vendors support multiple policies, so you will want to research the options and make the choice that offers you the best availability and performance options. If you have improperly configured pathing policies, it can result in improper failover, which can cause an outage.
Data store presentation: The last storage item is data store presentation to your hosts. It is critical that all hosts within a vSphere cluster have access to the same set of data stores. This allows for proper HA failover and vMotion options.
I see all too often where customers have a host in a cluster that does not have access to one or more data stores. This could result in the inability for the cluster to restart all virtual machines (VMs) in the event of a failure. It can also limit DRS from balancing the cluster properly, or could restrict your ability to move a VM to the host manually.
When looked at individually, most of these items do not look like they would be the item to ruin your day. But when many customers suffer from several of these misconfigurations, the likelihood of a negative effect on the environment is high. I cannot stress enough to audit these types of items when configuring new hosts and clusters. There should be some type of regular schedule for checking these items through the lifecycle of your vSphere hosts.