Greg Blomberg - Fotolia

What are the best practices for VMware vMSC deployment?

When it comes to VMware vMSC deployment, two factors to keep in mind are distance and storage subsystems.

The main focus with VMware's vSphere Metro Storage Cluster (vMSC) is on network connectivity. Geographic distances should remain short to minimize round-trip network latency, and network bandwidth between sites should be sufficient to ensure rapid VM migration and synchronization. vMSC works with Fibre Channel (FC), iSCSI, network file system (NFS) and Fibre Channel over Ethernet (FCoE) protocols, but the management system network latency between sites -- as well as the maximum latency for synchronous storage replication -- should not exceed 10 milliseconds (ms) round-trip time (RTT). Tools such as vSphere vMotion and Storage vMotion tolerate significantly higher latency, but this is rarely acceptable for stretched clusters. Bandwidth for vMotion should be at least 250 Mbps.

The choice of storage subsystems also plays an important role in vMSC deployment. In order for VMware vMSC to work, the storage systems must be designed to act as a single, uniform storage resource -- able to read and write multiple locations simultaneously, while committing writes synchronously to ensure that each storage location is kept in sync. This often demands even higher bandwidth and lower latency. Otherwise, vMSC performance and vMotion behavior may be seriously impaired. So, there are often more or tighter requirements for the storage systems and related tools used to support vMSC.

For example, HP's Peer Persistence allows two separate HP 3PAR StoreServ systems to replicate and peer each other across metropolitan distances. The HP 3PAR Quorum Witness is deployed as a VM at yet another site to watch HP storage systems and inter-site links to help guide failover behaviors. Such third-party services reduce the maximum network RTT between storage and should not exceed 2.6 ms.

Similar concerns exist for EMC VPLEX storage systems and VPLEX Witness tools. VPLEX requires RTT between two VPLEX clusters under 5 ms for a nonuniform host access configuration -- and under 1 ms for uniform host access. Network RTT for VPLEX management is more forgiving at 10 ms. It's the same story for Hitachi Unified Storage systems, which require a network RTT between sites under 10 ms for uniform host access, and the RTT for synchronous storage replication should be under 5 ms. Hitachi platforms recommend a minimum network bandwidth of 622 Mbps between sites for vMotion activities.

It's important to remember that latency and bandwidth are just two issues affecting storage subsystem and quorum tool usage with VMware vMSC. Each storage vendor may impose additional requirements, such as minimum vSphere versions, firmware versions, management software versions, storage protocol preferences and so on. IT architects will need to examine and accommodate the vendor-specific storage system requirements for a vMSC deployment.

The idea of stretched clusters provides businesses with a practical means to balance workloads and improve availability beyond a single data center. When implemented properly, VMware's vMSC can guard workloads against a wealth of common data center system faults, and extend cluster resilience across buildings and metropolitan areas. vMSC uses familiar vSphere High Availability and Distributed Resource Scheduler to make critical workload migration decisions, but it's important to meet proper network requirements and other prerequisites to ensure proper operation.

Next Steps

VSphere Metro Storage Cluster: Cross-site HA, DRS from one cluster

What is VMware vMSC and how does it fit in the data center?

Dig Deeper on Backing up VMware host servers and guest OSes