NetApp storage area network (SAN) deduplication is useful for reducing storage requirements and saving space out of the box. When combined with a VMware Infrastructure 3 (VI3) or a virtual desktop environment, however, a few simple configurations can offer additional space savings and minimize the overhead of running deduplication on a storage array.
The storage savings add up quickly in a typical desktop virtualization deployment due to the sheer number of virtual machines (VMs) involved. For example, a typical desktop VM requires a 15 GB Virtual Machine Disk Format (VMDK). A deployment of 1,000 desktops would require 15,000 GB or 15 TB. The ability to reduce such storage requirements by 50% can create a storage savings of 7.5 TB -- quite a noticeable amount.
There are many different types of deduplication technologies. Some deduplication technologies only work with backup data while others work with primary storage and can deduplicate active data. Deduplication technologies are further categorized into in-line deduplication and post-process deduplication. In-line deduplication technologies deduplicate data as it enters the storage system; post-process technologies deduplicate data after it has been written to the disks. Each of these technologies has its advantages and disadvantages, but that's an article unto itself.
NetApp's deduplication implementation is a post-process implementation capable of deduplicating VMs and VM data in active datastores. It works with both network file system (NFS) and block storage such as iSCSI and Fibre Channel. The configuration for iSCSI and Fibre Channel is a bit more complicated for the storage admin, however, so we'll focus this tip on the use of deduplication with NFS datastores.
Activating NetApp deduplication
Storage admins can activate NetApp deduplication with just a few commands on the storage array. The deduplication license is free from NetApp, and aside from the minor performance impact of enabling deduplication there is very little reason not to use it.
Once deduplication has been enabled on the storage array, it's important to understand that it doesn't need to be configured. Storage administrators can turn on deduplication on the NetApp storage array and will see storage savings without any further configuration of the virtualization software or the virtual machine.
Again, it's possible to use deduplication with any storage protocol that NetApp supports -- Fibre Channel, iSCSI and NFS -- but the space savings will be most easily achieved and most evident when used in conjunction with NFS.
Deduplication configuration enhancements
While you don't have to configure NetApp deduplication, you'll save more storage space if you do. By employing some of the optimizations listed below, organizations can gain the greatest benefit from the use of NetApp deduplication with VI3.
- Grouping VMs
VI administrators can group similar VMs together within a datastore to help improve deduplication ratios. For example, VMs can be grouped based on guest OS so that VMs running Windows Server 2003 are stored in one datastore while VMs running Red Hat Enterprise Linux are grouped in another.
- Separating the guest OS from its data
VI administrators can separate the guest OS from data that the guest OS uses. One way of doing this is to use multiple virtual disks within the guest OS. Outside the guest OS, this is manifested as multiple VMDKs. For VMs running Windows Server 2003, for example, this would mean separate C: and D: drives within the VM.
- Putting the swap file on a separate VMDK
Where the guest OS allows, VI administrators can separate the swap file or swap partition onto a separate virtual disk (VMDK) which is stored on a non-deduplicated datastore. This saves the storage array the effort of trying to deduplicate data that is not likely to have lots of duplicate blocks nor is really needed. Keep in mind that this adds a layer of complexity to VM configurations that organizations may not want to manage.
- Ensuring proper guest OS file alignment
VI administrators can ensure that guest OS file systems are properly aligned within the VMDK files. This is considered important for maximum performance since it can prevent unnecessary I/O operations due to misalignment.
With these optimizations in place, the amount of storage saved may increase and the overhead of running deduplication on the storage array can be minimized as much as possible. Keep in mind, however, that deduplication does not reduce or affect the other key metric that should be used when sizing and designing a SAN solution: the IOPS requirements. Deduplication can help reduce capacity requirements, but deduplication cannot reduce IOPS requirements and SANs must still be architected to be able to meet the IOPS needs of the environment.
ABOUT THE AUTHOR: Scott Lowe is a senior engineer for ePlus Technology Inc. He has a broad range of experience, specializing in enterprise technologies such as storage area networks, server virtualization, directory services and interoperability. Previously he was President and CTO of Mercurion Systems, an IT consulting firm, and CTO of iO Systems.