This is the last part of a multi-part series on the various changes in VMware vSphere 4.1. To navigate through the series, use the links provided below.
By now, the term "VAAI" might have become part of general parlance in virtualization circles, but what isn't widely known is that a fourth feature or "primitive," as VMware refers to it, is missing from vSphere 4.1. The story is an interesting one as it shows how partner relationships between various companies don't always run as smoothly as true love. Anyway, let me explain what VAAI is and then I will let you into the "Da Vinici Code" mystery that is the hidden fourth 0rimitive.
VAAI stands for "vStorage APIs for Array Integration." That little mouthful basically means that VMware has written software that allows the storage vendors to "hook" into vSphere 4.1 in a much more intelligent way than before. This means the storage arrays are becoming what I call "vm-aware" (pun intended). Just like the processor vendors have added "enlightenments" to the CPU to improve performance or security in the shape of Intel-VT or AMD-V, the storage vendors are doing the same (if it helps, try thinking of VAAI as hardware assist for storage). VAAI currently contains three components or primitives which offer improved performance. There should be four.
Primitive one: Full Copy or Copy Offload
The first primitive is called full copy, and some vendors have dubbed it copy offload. Currently, when you create a VM from a template in vSphere4 and later the source has to be read down the storage pipe from your templates datastore (whether Fiber-Channel, iSCSI or network file system) and then written up to the new destination. This puts a CPU load on the ESX hosts and chews up your valuable IOPS that should be used not for deploying new VMs but servicing the requests of the existing production VMs.
Even with the advance of thin virtual disks, the whole process involves watching quite lengthy and tedious status bars. VAAI's full copy or copy offload feature does away with this by intelligently cloning the VM inside the array from one volume to another. VMware claims 10 times to 20 times the efficiency with this feature, and it should pave the way for virtual desktop environments to deploy new VMs at a blistering rate compared to previous methods.
All Fault Tolerance (FT)-enabled VMs need their virtual disks converted from either the thin disk or zeroed thick disk format, to what's called the "eager zeroed thick" disk format. The entire process can be very time consuming, as the conversion process "zeros out" every block within the virtual disk in an action that is similar to a secure delete process. Finally, other storage events such a Storage vMotion (the move of a VM from one datastore to another) are accelerated by the support for VAAI at the array level.
Primitive two: Block Zeroing
Our second primitive is called block zeroing. This feature is closely tied with the cloning process once again. Inside the virtual disk file exists both data (blocks that are full) and free space that is yet to be written to (empty blocks).
If you think about the virtual machine cloning process it is essentially a file copy process. The files that make up the VM are copied from one datastore to another. In previous releases this meant that if you copied a 40 GB virtual disk with 10 GB of data, the deploy process would create IOPS for the data, and then a lot of very repetitive SCSI commands would be sent as instructions to the array for all of the empty blocks or free space that make up the virtual disk.
Block zeroing allows for a greatly reduced set of SCSI commands to be sent from the ESX host to the array. Block zeroing should make the process of enabling VMware Fault Tolerance (FT) less painful. All VMware FT-enabled VMs need their virtual disks converted from either the thin disk or zeroed thick disk format to the eager zeroed thick disk format. The entire process can be very time-consuming as the conversion process "zeros out" every block within the virtual disk, in an action that is similar to a secure delete process. Combining block zeroing enabling with VAAI greatly reduces the number of required SCSI commands.
Primitive three: Scalable Lock Management
The third and final primitive is called Scalable Lock Management. Ever since ESX 2.x, VMware has used SCSI Reservation Locks with its Virtual Machine File System (VMFS) to allow multiple ESX hosts access to the same datastore. These locks allow for multiple access of many ESX hosts to the same datastore and are critical for preventing corruption, such as if two VMware admins both tried to create the same VM at the same time on the same volume. When a VM is powered-on its files become locked, much the same way as a document opened in Word becomes locked. This prevents silly administrative accidents by making it impossible to delete a VM (or document) while it is powered on (or in use).
VMFS is essentially a clustering file system, and its use was once mandatory for features like VMotion. When VI3.0 was released VMware introduced support for Network File System (NFS) protocol as well as new technologies that also relied on SCSI Reservation Locks such as VMware High Availability and Distributed Resource Scheduler.
Over the years, VMware has written and rewritten the VMFS driver in an attempt to reduce the performance penalty that comes with excessive SCSI reservations. These SCSI reservations would typically show themselves in events where the status of the file system is experiencing a large number of changes –- what VMware call "VMFS Metadata" updates -– which include actions like VMotion, creating a new virtual machine (VM), powering on or powering off a VM, and deleting a VM or snapshot.
These are quite common events in VMware shops, and each one imposes a lock. Now, it's important not to overstate the affect of this locking process, as most customers have been unaware of it. Such is the strength of VMFS. With that said, anything that reduces or offloads locking processes away from the ESX host is all good. So this primitive – we'll call it "Hardware Assisted Locking" if you like -- is not to be sniffed at.
Finally, VAAI introduces a series of new options and utilities to take advantaged of VAAI. By default, VAAI is turned on on an ESX host, and if the array is not VAAI capable then these advanced options are simply not called. For more detail on the advanced settings check out VMware's vStorage APIs for array integration FAQ.
The dark side of VAAI
All of these primitives are good, and it's a feat of VMware engineering work in conjunction with its storage partners to deliver these huge advances in performance between the hypervisor and storage layer. It is, however, not without downsides.
Firstly, your existing storage array may well not be VAAI capable. At the very least it may need firmware updates to the controllers of the array. In most cases these are successful events and seamless to your environment: Controller A is upgraded first, and Controller B takes the production load until Controller A is up again, and then the upgrade of the firmware can take place on Controller B. However, it's worth mentioning that there are plenty of war stories where firmware upgrades have gone unexpectedly pear-shaped, resulting in lost volumes or logical unit numbers (LUNs), or loss of functionality in the array.
In the worst case scenario, your existing array may be 32-bit, and your vendor's version of VAAI may only be supported on a 64-bit platform. In this case you really are talking about buying a new array or having to wait until your current array reaches its end-of-life or maintenance window before you can take advantage of VAAI. For the most part this might not be a pain point, it depends how lucky you are in the flow of storage renewal. It's worth saying that there is a lead in time for these new firmware updates to ship out to customers.
If you want to check if your array supports VAAI you can use the command:
Three primitives are shown when actually there is a hidden fourth primitive. Additionally, there
is a new Hardware Acceleration column, which can be seen on the properties of a VMFS volume:
Click to enlarge.
Secondly, in its current guise, VAAI is only supported with block-level storage (Fiber-Channel or iSCSI), which means customers who use NFS will not benefit from VAAI. From my discussions with various storage vendors it seems likely that NFS customers will have to wait for vSphere 5 before they can enjoy the same enhancements.
Many experts in the field will chose to interpret this as yet again a case where NFS support lags behind the other storage protocols. It's perhaps salutary to remember that despite the increased use of NFS with VMware, around 80 percent of VMware customers still run on block-level storage (Fiber-Channel/iSCSI), and VMware has to triage their quality assurance resources around its core customers when it comes to cutting-edge features.
The missing fourth primitive: Thin Provisioning Stun
It's now time talk about the missing fourth primitive. There was supposed to be fourth component to VAAI called Thin Provisioning Stun. Thin Provisioning Stun appeared in presentations from VMware, Cisco and EMC as late as 2009. In fact if you Google the phrase "Thin Provisioning Stun" there are still documents out there that will tell you about it. Additionally, while I was at the North Carolina User Summit, there was still talk of four components making up VAAI.
(VAAI discussion is from 20 minutes to 36 minutes.)
The Thin Provisioning Stun feature was intended to help customers avoid running out of physical disk space. Thin Provisioning Stun is different from the other three primitives because it doesn't in itself improve performance -- it's actually a more controlled response to possible error with thin-provisioned volumes. One of the problems with thin volumes is that it's entirely possible to over-subscribe the storage to able to create more VMs with virtual disks that are collectively greater than the volume can hold –- which is considered an advantage. It saves you money by not forcing you to buy disk space that you may or may not actually write too. So it's possible to present 2 TB of storage to ESX, when in fact you only have 1 TB. The worry is that as those thin virtual disks start to collect data you could find yourself running out of physical disk space.
I've always explained this to customers as having the Wile E Coyote experience. You know the scene where the coyote goes over the end of the cliff trying to catch roadrunner, only to find the ground beneath him isn't there anymore? In previous versions of VMware ESX, if this situation happened the IOPS request of the VMs would stack up and then result in Blue Screens of Death or Kernel Panics. In vSphere 4.1, with a fully capable VAAI-enabled array, the Thin Provisioning Stun sends a "stun" command to pause the VM. This is achieved by the storage array sending its TP State or out of space condition to the ESX host. If this happens, the ESX hosts will stun or pause the affected VMs. Then, an administrator can add additional capacity and resume the affected VMs.
This situation would cause a pop-up message like so in the vSphere Client:
The interesting thing about this fourth primitive is that it is only present in some vendor's storage arrays that support VAAI. The primitives were defined some time ago but only some storage vendors did the research and development work to ensure they would be certified for all four primitives.
As I understand it, at this year's VMware Partner Exchange (PEX) in Las Vegas in February, some storage vendors had presentations showcasing support for all four primitives and some did not, and this came as a surprise to the folks at VMware because they were expecting storage vendors to support three. This led to some storage vendors having to quietly change their presentations to show only three primitives rather the full selection. Essentially, this change was made to prevent other storage vendors from becoming annoyed because of perceived VMware favoritism for one set of storage vendors over another. All in all, it sounds like a case of miscommunication between VMware and some of its partners rather than deliberate favoritism.
On the surface, this isn't a big deal. Every day, vendors promise features W, X, Y and Z, and in the end only deliver X, Y and Z. The interesting thing about this is that there seems to be a slight disconnect between VMware and it storage partners. So inside your brand new array, there may or may not be a feature that could help you manage your storage more effectively and proactively stop a major outage. The advanced programming interfaces (APIs) that VMware defined and asked its storage partners to write are there, but VMware itself hasn't completed its QA process on all of the APIs and was unable promote the fourth primitive. The feature did make the General Availability release, but VMware is being quiet about it.
While some storage vendors did very good work, it appears VMware dropped the ball with the fourth primitive that lead to some confusion and friction with its storage partners. It's my hope that there will be an update on this issue sometime this year to complete the fourth piece in the puzzle. Hopefully, by then all of the storage vendors will have correct firmware in place, and VMware will have completed its QA process, and the fourth primitive will, as if by magic, make its presence known.
Of course, you should really have all kinds of alarms and alerts enabled both in vSphere and on the storage array to make sure you don't find yourself in this horrible situation in the first place!
|Mike Laverick (VCP) has been involved with the VMware community since 2003. Laverick is a VMware forum moderator and member of the London VMware User Group Steering Committee. Laverick is the owner and author of the virtualization website and blog RTFM Education, where he publishes free guides and utilities aimed at VMware ESX/VirtualCenter users, and has recently joined SearchVMware.com as an Editor at Large. In 2009, Laverick received the VMware vExpert award and helped found the Irish and Scottish VMware user groups. Laverick has had books published on VMware Virtual Infrastructure 3, VMware vSphere4 and VMware Site Recovery Manager.|
This was first published in July 2010