VMware troubleshooting can be tedious and difficult, because an enterprise vSphere infrastructure is made up of...
multiple, complex pieces of hardware and software. As such, many companies purchase VMware Support and Subscription packages to hedge against their IT departments' inability to solve virtualization issues.
So how should you go about VMware troubleshooting? When should you place a call to VMware? Follow the steps below to get the quickest and easiest solutions to your vSphere problems.
VMware troubleshooting step one: Isolate the problem
Let's say that a junior admin says that everything in the virtual infrastructure has gone down. (Don't you love that one?) Before you point the finger at vSphere, you need to isolate the problem.
- Check the storage. Most virtual machines (VMs) are stored on a storage area network or network-attached storage. If the storage is unavailable, the VMs typically freeze. When there is a large outage, there is a good chance that it's related to the storage, which is a single point of failure in many environments.
- Check the network. The network can also be a single point of failure. If the core network switch has lost power or is locked up, connectivity to the virtual infrastructure is lost. Also, if you use the Network File System or iSCSI storage protocols, a network outage can also cause a storage outage.
- Check DNS. If the domain name servers (DNS) are down, the virtual infrastructure can appear down -- when it is really just a DNS issue.
- Check vCenter. If the vCenter server is down, the VMs and hosts will still work, but the vSphere Client won't connect to vCenter Server. When this situation occurs, less experienced admins may assume that the whole virtual infrastructure is down.
- Check the hosts. If the production servers or critical infrastructure servers (e.g., vCenter or DNS servers) have crashed or lost power, you may assume that there are larger problems.
Let's say that you've isolated the problem to either a vCenter or ESX/ESXi host issue. It's time to move to the next VMware troubleshooting step.
VMware troubleshooting checklist
Isolating problems in vSphere can be a complex process. It took 14 hours to cover it in my vSphere troubleshooting video, and VMware has a huge support group dedicated to it.
That said, here's a quick VMware vSphere troubleshooting checklist that will hopefully get you up and running as quickly as possible:
- Connect to the ESX or ESXi server console through the Secure Shell (SSH) remote command line, physical console or KVM-over-IP console. Then, run the esxtop command to identify hung processes or ones that are overutilizing resources. You may want to kill a hung process or tweak its resource constraints if a process is legitimately sucking down resources.
- Use the vSphere Client to monitor the performance of the hosts and vCenter. Many times, a performance issue will make vCenter appear to be down or a host to appear as unresponive.
- With the vSphere Client's graphical user interface or command line, check for error messages in the ESX/ESXi server log files, located in /var/log/vmware. Some common errors include iSCSI naming or authentication problems and host connection issues. (You can find solutions to all of these errors messages in VMware Knowledge Base articles.)
- If the vSphere Client or vCenter can't connect to the ESX/ESXi server (vCenter views the host as disconnected), you can restart the management processes with following actions:
- In ESX, use the service mgmt-vmware restart and service vmware-vpxa restart commands.
- In ESXi, use the/sbin/services.sh restart command or restart ESXi management agents from the direct console user interface.
What if you can't quickly solve the problem on your own? If you have a VMware support contract, don't do anything drastic, such as hard-booting hosts or reinstalling ESX/ESXi. Your company paid good money for VMware support, so now is your chance to get a return on that investment.
Using VMware support
Of course, you can call VMware support and follow the instructions. But, like going to the doctor, it will work best if you anticipate what will happen and are prepared to assist them.
When calling VMware support, here are five tips to make the process as efficient as possible:
- Make sure to search the VMware Knowledge Base first to see if you can find a resolution to your problem.
- You don't have to submit a support request via the phone. Do it online.
- Be prepared with the following information to get support quickly:
- VMware.com username and password
- VMware customer number
- Your version numbers for vCenter and ESX/ESXi
- A quick, specific summary of the problem (e.g., don't tell them that vSphere is broken). Include what you have changed and what you may have learned during the troubleshooting process.
- Collect vSphere diagnostic information for VMware (or whatever VMware product you are using). In ESXi, for example, you can collect support information -- such as logs and configurations files -- by running the vm-support script in the tech support mode console. Doing so creates a TGZ file that you can provide to VMware's support group.
- If your support request is critical and you don't feel that VMware is giving it the priority it deserves, you can escalate your request.
For more information about working with VMware tech support, read the VMware Technical Support Guide.