The challenge with shared storage is that it is expensive and this has created a VMware/storage inflection point that has received attention lately, namely the amount of storage that virtual infrastructures consume and how to reduce that capacity. If the amount of data required by virtual machines can be reduced, this will drive down the total cost of the virtualization project. Storage is the right place to focus in a virtualization project because of the expense of shared storage and the payback that can be gained from storage efficiency.
At the heart of the problem is the proliferation of virtual machine disk format (VMDK), or the virtual disk images that VMware assigns to virtual systems. These files can range in size from 10 GB to 50 GB and there could be 30 to 40 on a physical host, depending on the number of virtual machines being deployed. Multiply this per physical host in your environment and you can end up with a sizeable capacity investment in VMDK files.
VMware data optimization
Storage optimization can start with capabilities built in to the storage system, such as thin provisioning and writeable snapshots. Thin provisioning allows for the oversubscription of storage by sizing a volume
VMware data reduction
VMDKs are ideal candidates for data reduction because they tend to have a lot of compressible, highly redundant data. There are currently two options available to address this need. First is to compress data with an inline compression appliance; the second is to identify redundant data within the VMDK files and use data deduplication to eliminate redundant blocks between those files.
Real-time compression is similar to software-based compression utilities, except companies such as Storwize have moved the technology to an appliance that sits inline to the data path and compresses/decompresses data through the system. Utilizing the extra processing capabilities of the appliance allows for greater compression rates while maintaining line speed and often improving overall storage performance.
The devices sit inline and are transparent to the virtual infrastructure, requiring no additional configuration changes. Today these appliances require the increasingly popular Network File System (NFS) mounting of VMDKs that were introduced in VMware VI 3.5.
The result of implementing an inline compression device is far reaching. VMware images are typically up to 80% compressible, which has an impact across the entire data center. Images can be sent in their compressed forms to the backup application, reducing the network requirements and backup storage requirements of the data protection process. The data stays compressed when replicating across a wide-area network (WAN) link for disaster recovery purposes, effectively delivering 80% more bandwidth out of the WAN.
Lastly, the storage requirement itself is reduced by 80%, which allows for greater efficiencies of the expensive share storage system. For example, there can be more VMDK files stored per system, longer retention of snapshots and more data fit into the storage system's cache. All of these attributes increase performance and lower costs.
The next step in optimizing VMware storage requirements is data deduplication. First popularized as a backup storage method by companies such as Data Domain, deduplication has gone primetime and vendors like NetApp offer data deduplication of primary storage. Deduplication compares blocks of data to other blocks of data on the volume. When a duplicate block is found, only one instance of that block is stored. This means space savings even when the files are not exactly the same. In the VMware example only the unique parts of the OS binaries need to be stored.
For deduplication to pay off you need redundant data, and while primary storage has some redundancy of files, it has nowhere near the redundancy of week after week of full backups.
For primary storage deduplication to be effective you need to select particular data sets that have this redundancy. As mentioned earlier, VMware VMDKs are front and center. Thirty Windows VMDK files will contain 30 very similar binaries, patches and auxiliary applications. The reduction can result in as much as 60% or more in storage capacity.
Achieving maximum storage efficiency
Real-time compression and deduplication both play a role in the reduction of primary storage and in many cases are complimentary to each other. As described earlier, when the data deduplication process identifies and finds redundant blocks across a volume, it only stores one copy of that block. For deduplication to work and for storage capacity gains to be made there must be redundant data.
Real-time compression, on the other hand, compresses and decompresses all data as it goes through the appliance, regardless of how much commonality there is between blocks. There are many cases in which compression will offer a higher return on storage efficiency, especially in areas where deduplication is not as effective. Databases are a good example. An inline compression appliance can actively compress/decompress a database with little or no performance impact and still deliver a 75% or greater capacity savings. ,
Obviously there will be more in those VMDK files than just OS binaries. There will be relatively unique sets of data in use, such as databases, email stores and user data. Using the combination of a compression appliance with deduplication can result in a total increase in storage efficiency that can top 90%.
For maximum VMware storage efficiencies it makes sense to explore both options -- compressing everything and then deduplicating the remaining compressed data were duplicates exist.
ABOUT THE AUTHOR: George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on storage and virtualization. With 25 years of experience designing storage solutions for data centers across the U.S., Crump has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, he was CTO at one the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection.
This was first published in October 2008