Occasionally, an ESXi host will reboot abruptly, often during a power outage if the uninterruptible power supply...
doesn't last long enough. This can cause the ESXi host log to end abruptly and then restart. If you aren't on site during the failure and subsequent reboot, it can be a challenge to troubleshoot. The first step, of course, is to check for a UPS failure or power outage. If that isn't the source of the issue, look to your environment for clues.
It's more difficult to troubleshoot logs when ESXi host logs aren't consistent across reboots. This isn't as much of a problem if you redirect your ESXi host logs to a shared data store or an external software application, such as VMware vRealize Log Insight. To check whether your logs were redirected, connect to either the vSphere Web Client or the Host Client for unmanaged hosts. Select your host, and under the Configure tab, select System > Advanced System Settings.
I'm using a software product that processes my logs for this example. You can see the IP address within the Syslog.global.logHost field, shown in Figure A.
Once you've determined whether your ESXi host logs were redirected, you can check whether the host was intentionally restarted. Look in the /var/log/hostd.log directory. You might find something similar to the following examples, which indicate a deliberate reboot:
Hostd: [12:51:54.284 27D13B90 info 'TaskManager'] Task Created : haTask-ha-host-vim.HostSystem.reboot-50
Was there a core dump?
VMs or ESXi hosts can generate a core dump. You can check if you have the required partition available for a core dump through the Direct Console User Interface, either at the console in the server room or via the Intelligent Platform Management Interface.
You can also use a Secure Shell (SSH) client, such as PuTTY, to connect remotely to your ESXi host to check this. In order to do so, you need to configure SSH access to your ESXi host.
Enter the following command at the command prompt to list partitions available for core dump:
If you need to activate or deactivate core dump partitions, you can get more commands in this list by entering the following command:
ESXi hosts do not automatically collect the core dumps. To collect the core dump, manually run the esxcfg-dumppart command with an option that is valid for your environment.
Is ESXi configured to reboot automatically?
Execute the following command to check if ESXi is configured to automatically reboot after a Purple Screen of Death (PSOD):
esxcfg-advcfg -g /Misc/BlueScreenTimeout
If the value isn't 0, then ESXi will automatically reboot after the PSOD. If it is 0, then it will wait for you to manually restart the host.
Is the hardware faulty?
If your ESXi host experiences an outage that is not the result of a kernel error, a human reboot or an intentional shutdown, then the cause might be the physical hardware. Hardware might reboot abruptly due to a faulty component, heating problems -- such as air conditioning failure -- or a power outage in the data center.
Power failures can be nightmares for virtualization admins. If you live in a country or city that often has power failures, you might need to convince upper management to invest in UPS protection or, beyond that, a generator or solar-powered battery backup in case of long power failures.
Troubleshoot your host with ESXi logs
Keep these command lines in mind for repetitive mass actions
Use vSphere Web Client to collect ESXi host logs
Log in to vCenter Server with vSphere 6 Web Client