tiero - Fotolia

What kind of failures should VMware vMSC address?

VMware vMSC is able to detect a wide variety of failures, ranging from something very minor to even the major issues in an environment.

A properly architected vSphere Metro Storage Cluster (vMSC) environment should be able to accommodate a wide range of failures -- everything from a fault in a single network cable up to and including the loss of an entire data center. Such a broad scope makes it impossible to detail every possible permutation, but you can see several general categories.

First, VMware vMSC is effective at isolating and correcting single-host (single-server) failures within a data center. For example, vMSC can tell that a host loses network connectivity, and VMs can continue to run (awaiting network restoration). Similarly, a single host that fails completely can utilize vSphere HA affinity rules to determine where to restart any affected VMs on the troubled host.

VMware vMSC is also capable of addressing almost any practical issues that arise in storage resources duplicated between multiple remote data centers. Although individual disk or disk group faults are typically rectified within the affected storage array, vMSC can identify the loss of an entire disk shelf within an array and recognize the loss of connectivity between storage subsystems across data centers --storage switches lose connectivity, but server connectivity remains. It can also address the total loss of connectivity between data centers (storage and host servers), and deal with the complete storage failure at a data center. In all of these scenarios, it's possible for the organization's VM workloads to remain running without any disruption.

Third, vMSC can address even more substantial or disruptive events at one data center and restart affected VMs at another. For example, vMSC can detect and respond to a permanent device loss (PDL) in a data center, a full compute failure at one data center, or the complete loss of an entire data center. With the correct implementation and configuration, all of the affected VMs in one data center can be successfully restarted in another without disruption.

Proper recognition and response by VMware vMSC depends on the correct settings which influence the availability and recoverability of VMs in the aftermath of a failure. This means IT professionals will need to pay particularly close attention to VM-to-host affinity rules, selected responses to PDL, proper isolation address configuration and heartbeat data stores, and taking pains to avoid split-brain scenarios when failures occur which can complicate or introduce errors in recovery.

Next Steps

What are the best practices for VMware vMSC deployment?

How does VMware vMSC fit in the data center disaster recovery?

Using VMware vMSC for a more flexible data center

How vSphere Metro Storage Cluster differs from Site Recovery Manager

Architecting and operating a VMware vSphere Metro Storage Cluster

Dig Deeper on Backing up VMware host servers and guest OSes