Storage I/O control uses storage latency to manage the I/O queue shares provided to virtual machines inside of a data center. If storage latency increases -- worsens -- beyond a predefined threshold, I/O shares reduce to lower VM storage use. This leaves more storage I/O bandwidth for other less-demanding VMs, thus causing issues in the data center
Although storage I/O control can be an effective way to handle incidental or occasional storage sharing issues, it is not always suitable for every VM. Some VM workloads require heavy storage I/O by design, and invoking storage I/O control could have an undesirable effect on the workload's performance.
It's important for administrators to understand the underlying requirements of each VM and determine whether applying storage I/O control might do more harm than good. Remember that storage I/O control isn't an all-or-nothing proposition; it's dynamically configurable and administrators can adjust rule settings on the fly, including IOPS minimum, maximum and shares, to alleviate storage congestion while minimizing performance impacts to the VMs.
In some cases, it might be better to relocate a storage-intensive VM on another host, attach that storage-intensive VM to a better-performing storage resource -- or tier -- or otherwise rebalance the workloads on that host to alleviate heavy storage I/O control use.
How admins can avoid storage I/O control issues
Administrators should employ performance monitoring and reporting to watch VM performance against storage I/O control use -- gauging the performance impact it has when invoked. That's the most objective way to measure storage I/O control's impact on workloads -- and ultimately the business.
Beyond the issue of whether storage I/O control is appropriately implemented and properly configured for a vSphere host, several common technical issues can appear in a storage I/O control implementation. For example, administrators might discover that storage I/O control isn't operating as expected (if at all), VMs are not properly prioritized or rules only apply intermittently.
A common root problem occurs when more than one vCenter Server instance manages a data store. This can cause conflicts or erratic storage I/O control behavior if each vCenter Server instance employs differing configurations. For example, one vCenter configures storage I/O control one way and another vCenter configures it another; storage I/O control may work right sometimes, but not all the time. Verify that only one vCenter Server instance is governing each data store, and ensure that the vCenter Server instance uses the desired configuration.
Administrators can usually address these types of issues by checking and toggling storage I/O control for the problematic data store. For example, check the data store's properties and enable storage I/O control.
If it is already enabled, toggle storage I/O control off and save the changes. Then re-enable it and save the changes. This could help if the number of hosts using the data store changes after first enabled.
Next, check the IOPS threshold for the data store's properties -- an advanced setting under storage I/O control. Make sure the latency threshold value is set to the proper level. The default setting is 30 milliseconds. If the latency changes, verify that the setting is correct for your infrastructure.
VMware designed storage I/O control to prioritize desired VMs, but every VM has the same number of I/O shares and IOPS limit. If certain VMs are not demonstrating the desired amount of prioritization when storage I/O control starts, administrators should verify that I/O shares and IOPS limits are set for each VM listed in the cluster. The storage section of the cluster's resource allocation dialog keeps not of this. Administrators can then work to change the I/O shares and IOPS limit for individual virtual disks in the vSphere Client inventory.
Administrators can also gather additional insight into storage I/O control behavior and errors by enabling logging on the host system. Typically, you can disable storage I/O control logging when it's operating normally with VMs, but enabled only long enough to help resolve problems. You can disable logging again to conserve log space and prevent unnecessary performance impacts of log activity.
Finally, consider the actual physical storage platform and its pooling/tiering capabilities. For example, problems can occur when you apply storage I/O control to data stores residing on storage arrays with automatic tiering features that are not compatible with is. It may be necessary to update the storage array's software to enable compatibility with VMware storage I/O control or relocate the data store to a different storage resource with better support.
Dig Deeper on Troubleshooting VMware products
Related Q&A from Stephen J. Bigelow
Regression tests and UAT ensure software quality and both require a sizeable investment. Learn when and how to perform each one, and some tips to get... Continue Reading
Learn the meaning of functional vs. nonfunctional requirements in software engineering, with helpful examples. Then, see how to write both and build ... Continue Reading
Just because software passes functional tests doesn't mean it works. Dig into stress, load, endurance and other performance tests, and their ... Continue Reading