Panicking at the onset of a high impact technical problem can cause impulsive decision making that enhances the problem. Before trying to troubleshoot any problem, pause and relax to approach the task with a clear mind, then address each symptom, possible cause and resolution appropriately.
In this series, I offer solutions for many common problems that arise with VMware ESX host servers, VirtualCenter, and virtual machines in general. Let's begin by addressing common issues with VMware ESX host servers.
Windows server administrators have long been familiar with the dreaded Blue Screen of Death (BSOD), which signifies a complete halt by the server. VMware ESX has a similar state called the purple screen of death (PSOD) which is typically caused by hardware problems or a bug in the VMware code.
Troubleshooting a purple screen of death When a PSOD occurs, the first thing you want to do is note the information displayed on the screen. I suggest using a digital camera or cell phone to take a quick photo. The PSOD message consists of the ESX version and build, the exception type, register dump, what was running on each CPU at the time of the crash, back-trace, server up-time, error messages and memory core dump info. The information won't be useful to you, but VMware support can decipher it and help determine the cause of the crash.
Unfortunately, other than recording the information on the screen, your only option when experiencing a PSOD is to power the server off and back on. Once the server reboots you should find a vmkernel-zdump-* file in your server /root directory. This file will be valuable for determining the cause. You can use the vmkdump utility to extract the vmkernel log file from the file (vmkdump –l ) and examine it for clues as to what caused the PSOD. VMware support will usually want this...
To continue reading for free, register below or login
Requires Membership to View
To read more you must become a member of SearchVMware.com
');
// -->
file also. One common cause of PSOD's is defective server memory; the dump file will help identify which memory module caused the problem so it can be replaced.
Checking your RAM for errors If you suspect your system's RAM may be at fault you can use a built-in utility to check your RAM in the background without effecting your running virtual machines. The RAM check utility runs in the VMkernel space and can be started by logging into the Service Console and typing Service Ramcheck Start.
While RAM check is running it will log all activity and any errors to the /var/log/vmware directory in files called ramcheck.log and ramcheck-err.log. One drawback, however, is that it's hard to test all of your RAM with this utility if you have virtual machines (VMs) running, as it will only test unused RAM in the ESX system. A more thorough method of testing your server's RAM is to shutdown ESX, boot from a CD, and run Memtest86+.
Using the vm-support utility
If you contact VMware support, they will usually ask you to run the vm-support utility that packages all of the ESX server log and configuration files into a single file. To run this utility, simply log in to the service console with root access, and type "vm-support" without any options. The utility will run and create a single Tar file that will be named "esx---
Rate this Tip
To rate tips, you must be a member of SearchVMware.com. Register now
to start rating these tips. Log in if you are already a member.
DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.