Righting a wrong vSphere network configuration

Misconfiguration is by far the most common cause of performance issues with virtual machines. A few basic checks can get VMs back on track.

When it comes to virtual machines, performance problems are far more likely to be caused by vSphere network configuration

errors than anything else. It's important to know how to do a few basic troubleshooting steps so that you can identify and correct performance issues. This is a large area, so I will cover the first things to look at. In most instances, these simple steps will resolve issues.

Performance issues in virtual machines typically come from either saturation of available resources or from misconfiguration. Sometimes a single VM will saturate a resource, while other times, a group of VMs will cause saturation. Occasionally, the VM will not get enough resources, so the VM is saturated even when the host has ample resources.

The other, more common cause of performance issues is misconfiguration. With VMs, network performance problems are far more likely to be caused by misconfiguration than resource saturation, because the wrong settings or hardware selection can impact responsiveness.

In most environments, the only thing that saturates ESXi networks is the VMkernel. IP storage -- like NFS, iSCSI and vMotion -- physical network interface cards (NICs). If VMs share physical NICs with IP storage or vMotion, then the VM's network performance may be affected.

Correct slow NIC speeds

A classic example of network misconfiguration on an ESXi server involves having one NIC connected at a lower link speed than the rest. In the diagram below, both NICs are set to auto-negotiate link speed. One NIC has negotiated a 1 GB full-duplex while the other has only managed to get a 100 MB half-duplex.

A faulty patch cable is typically to blame for this problem. The other possibility is that either the switch or the NIC is not set to auto-negotiate. If possible, both physical NICs and physical switch ports should be placed on auto-negotiate. As part of your build verification, make sure that every NIC in your ESXi servers connects at its fastest speed.

Fix slow NIC speeds.There are a few reasons why a NIC does not run at full speed. Here one NIC is set to 100 MB half-duplex while the other is 1 GB full-duplex.

Set NICs to VMXNET3

The other common misconfiguration is a VM's network adapter type. Ideally, all NICs should be VMXNET3 if supported by the guest operating system. VMXNET3 provides better throughput and lower CPU load compared to other type options. Note VMXNET3 only works in VMs with VMware Tools installed. The Intel e1000 type is nearly as good, but the "flexible" type is significantly slower.

Having VMware Tools installed is also important for VM performance since it includes optimized drivers. VMware Tools improves the performance every I/O device, both storage and network, and should be installed in every VM.

Pinpoint saturation and contention

The first resource usage monitoring tools to use are the performance graphs in the vSphere desktop or Web Client. When looking for evidence of a change, keep in mind that the real-time graphs only update every 20 seconds. Start by looking at the ESXi server to see if a resource is saturated , then inspect the VMs further to determine if they are getting -- and using -- the resources they require.

There are two scenarios where a resource gets saturated: The first is a single hungry consumer that wants all the available resources in a pool, and the second is a group of hungry consumers competing for the resource pool. For a single consumer, the solution is to allocate more resources to that pool. If multiple consumers want the same pool, you can add more resources to that pool or move some consumers to another pool.

For network performance, the resource pool is the bandwidth of a physical network adapter. The first indicator of a performance problem is that utilization on a physical NIC is high, usually hitting above 60%. You can use the performance graphs to compare all the physical NICs in a host and look for signs of a high load. When just one NIC is heavily loaded, this imbalance may be caused by a single VM or VMkernel port, in which case, the solution is a faster physical NIC or switch port. The other cause is a group of VMs competing for the same NIC even when there are less loaded NICs available. The options for resolving this imbalance depend on whether you can use distributed virtual switches (DVS) or vNetwork standard switches (vSS). DVS requires a higher license level than vSS, so it may not be available to you.

Untangle the vSS NIC contention

Since vSS NIC teaming does not do true load-balancing, you may need to use separate vSwitches to segregate high bandwidth workloads from lower bandwidth. Usually this means putting IP storage or vMotion on a separate vSwitch, which means you will need more physical NICs. If NICs are short, you can use active and standby configurations to keep the high bandwidth functions on one NIC, as shown below. The VM port groups would be configured to use the remaining NICs. 

If NICs are short, you can use active and standby configurations to keep high-bandwidth functions on one NIC, as shown below. The VM port groups would be configured to use the remaining NICs.

Isolate high bandwidth functions.Put separate high-bandwidth functions on one NIC in the VMotion NIC teaming setup screen.


 

 

 

 

 

 


 

 

 

 

 

 

 

 

 

 

 

Adjust the NIC-teaming options

The Distributed Virtual Switch (DVS) concept adds a few nice features for managing high-bandwidth loads, such as vMotion and IP storage. The simplest option is to use the "Route based on physical NIC Load" NIC-teaming option. This is the only option that actually balances load by changing which physical NIC is used. This is often referred to as load-based teaming (LBT).

Change the load-based teaming settingChange the load-based teaming setting to balance bandwidth loads.

Use NIOC to share the bandwidth load

If you are unable to use LBT, then you can use the DVS feature of Network I/O Control (NIOC). NIOC makes sure the overloaded NIC's bandwidth is distributed evenly across VMs and VMkernel functions.

Network performance issues often arise from misconfiguration; a small number of VMkernel functions can produce a high network load. Isolate the VMkernel load from the VM network and you’ll be able to safeguard VM network performance.

This was first published in May 2014

Dig deeper on VMware and networking

Related Discussions

Alastair Cooke asks:

How do you handle network congestion in vSphere?

1  Response So Far

Join the Discussion

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchServerVirtualization

SearchVirtualDesktop

SearchDataCenter

SearchCloudComputing

Close