Traditional system performance monitoring tools cannot access the underlying virtualization layer and often provide unreliable results for a virtualized environment. Fortunately, VMware vSphere has several built-in tools to monitor and troubleshoot host and VM performance.
To effectively monitor your virtualized environment from every angle, you should monitor the VM itself, the host, the networking traffic and the storage traffic. This approach also provides true insight into the performance health of your virtual environment.
Using the Perfmon performance monitoring tool
Let's begin with how to monitor virtual machines. If your VMs run Windows, the built-in
Requires Free Membership to View
When you register, my team of editors will also send you alerts covering all areas of VMware, such as implementing VMware-related virtualization technologies for server consolidation, disaster recovery and backup strategies, management and performance, VM migration and more.
Cathleen A. Gagne, Senior Editorial DirectorWhile the new VM counters provide more information about performance inside a VM, it is best to monitor performance outside a VM as well. To monitor performance outside a VM, use the vSphere Client, which has a Performance tab for every object that you can select in vCenter Server (see below). Objects may be data centers, clusters, hosts or VMs. Selecting a different object displays different types of data. Selecting hosts or VMs provides the most detailed performance data, such as information on CPU, memory, disk and network statistics.
When you select the Performance tab, you have two views available: Overview and Advanced. The Overview option presents a dashboard of key statistics, and the Advanced view provides more detailed information. Clicking on Chart Options offers additional counters that may not show by default.
The Statistics Level setting, which is configured in vCenter Server's settings, controls the number of counters for each category. In the vCenter Statistics setting, the default is level 1, which is the lowest level and provides only basic performance information on each resource. The setting can be increased up to level 4, which provides the maximum information possible.
Choosing level 4 reporting, however, can slow the performance of vCenter Server and drastically increase the size of its database. Here's why: VCenter Server stores historical performance data in its database and rolls up data from one time interval (i.e., a five-minute interval) to the next (i.e., a 30-minute interval). A rollup then takes the average of several readings for the first time interval to generate the second interval's value. So, vCenter, for example, might sample performance every five minutes and store these values. After 30 minutes, vCenter averages the previous six five-minute intervals and rolls them up to calculate the 30-minute one.
As you might now conclude, recording detailed performance metrics for multiple hosts and VMs can cause the database to become quite large. So unless you have to troubleshoot a performance problem and need more detailed information, I recommend leaving the Statistics Level at level 1; the default level provides plenty of useful information for everyday use.
Also note that while vCenter Server provides configurable performance statistic collections -- you can view both real-time and historical statistics as far in the past as you desire -- historical data is severely limited for ESX and ESXi hosts that are not managed by vCenter Server, (60 minutes for ESXi and 24 hours for ESX).
The esxtop and resxtop performance monitoring tools
Another key performance monitoring tool is the ESX service console utility esxtop. Its remote version is called resxtop and is included with the vSphere command-line interface and the VMware Management Assistant (vMA). While esxtop can be used only on ESX hosts, resxtop can run remotely and connect to ESX and ESXi hosts.
Esxtop is a text-based utility that generates real-time advanced performance statistics for all host resources. If you aren't familiar with esxtop's terminology, output and navigation, however, you may encounter a learning curve. Esxtop navigation requires the use of single-key commands that are not displayed on the screen. Still, you can access a list of available commands by pressing the H or question-mark (?) keys.
The default esxtop display shows CPU statistics. You can add fields by pressing the F key, or switch to other resource views by pressing C (for CPU), N (for network), M (for memory), D (for disk adapter), V (for disk VM), U (for disk device), or I (for interrupt).
When using esxtop, here's a tip: Expand your screen horizontally. Many fields that display across the screen are hidden until you expand your screen's view.
Esxtop can be run in three modes: interactive, batch and replay. Interactive mode is the default mode and allows you to view and change the data that is displayed on the screen. Batch mode allows you to collect and save data over time to a file, which can then be imported to Microsoft Excel or Windows Perfmon for review. Replay mode allows you to replay a statistics collection period that can be recorded with the vm-support command. The vm-support command allows you to specify a duration and interval to collect statistics which can then be saved to a file.
Esxtop isn't an all-purpose tool, but when it comes to troubleshooting performance problems, it's invaluable. It displays several advanced statistics that are not shown in the vSphere Client.
vCenter Server performance alarms
While periodic monitoring performance with the vSphere Client or esxtop is beneficial in several ways, neither the vSphere client nor esxtop alerts you to problems in your environment as they happen. Fortunately, you can use vCenter Server alarms. These alarms alert you when specific resource conditions exist. VMware Infrastructure 3 offered few – and unhelpful -- performance alarms. VMware vSphere offers many more than its predecessor, and these alarms are also more useful because of the new Condition Length field that was added to the alarm trigger (see image below).
The Condition Length field can eliminate false alarms by allowing you to specify the length of time for which a condition must persist before an alert is triggered. When setting an alarm for VM CPU usage (see image below), for example, it may not be a problem if a single VM is at 100% CPU use for a few seconds. If this condition were to persist for more than five minutes, however, it could indicate a problem. Condition Length allows you to receive an alert after that five minutes has elapsed.
There are many helpful alarms available for host and VM resource usage. I recommend taking advantage of them so that you can quickly resolve issues that affect performance in your environment.
|
Eric Siebert is a 25-year IT veteran with experience in programming, networking, telecom and systems administration. He is a guru-status moderator on the VMware community VMTN forums and maintains VMware-land.com, a VI3 information site. |
This was first published in June 2010