tiero - Fotolia
A properly architected vSphere Metro Storage Cluster (vMSC) environment should be able to accommodate a wide range of failures -- everything from a fault in a single network cable up to and including the loss of an entire data center. Such a broad scope makes it impossible to detail every possible permutation, but you can see several general categories.
First, VMware vMSC is effective at isolating and correcting single-host (single-server) failures within a data center. For example, vMSC can tell that a host loses network connectivity, and VMs can continue to run (awaiting network restoration). Similarly, a single host that fails completely can utilize vSphere HA affinity rules to determine where to restart any affected VMs on the troubled host.
VMware vMSC is also capable of addressing almost any practical issues that arise in storage resources duplicated between multiple remote data centers. Although individual disk or disk group faults are typically rectified within the affected storage array, vMSC can identify the loss of an entire disk shelf within an array and recognize the loss of connectivity between storage subsystems across data centers --storage switches lose connectivity, but server connectivity remains. It can also address the total loss of connectivity between data centers (storage and host servers), and deal with the complete storage failure at a data center. In all of these scenarios, it's possible for the organization's VM workloads to remain running without any disruption.
Third, vMSC can address even more substantial or disruptive events at one data center and restart affected VMs at another. For example, vMSC can detect and respond to a permanent device loss (PDL) in a data center, a full compute failure at one data center, or the complete loss of an entire data center. With the correct implementation and configuration, all of the affected VMs in one data center can be successfully restarted in another without disruption.
Proper recognition and response by VMware vMSC depends on the correct settings which influence the availability and recoverability of VMs in the aftermath of a failure. This means IT professionals will need to pay particularly close attention to VM-to-host affinity rules, selected responses to PDL, proper isolation address configuration and heartbeat data stores, and taking pains to avoid split-brain scenarios when failures occur which can complicate or introduce errors in recovery.
What are the best practices for VMware vMSC deployment?
How does VMware vMSC fit in the data center disaster recovery?
Using VMware vMSC for a more flexible data center
How vSphere Metro Storage Cluster differs from Site Recovery Manager
Architecting and operating a VMware vSphere Metro Storage Cluster
Dig Deeper on Backing up VMware host servers and guest OSes
Related Q&A from Stephen J. Bigelow
Navigating data center malfunctions when hardware is off premises can be tricky. Organizations must have strong SLAs with their colo provider to ... Continue Reading
Regression tests and UAT ensure software quality and both require a sizeable investment. Learn when and how to perform each one, and some tips to get... Continue Reading
Learn the meaning of functional vs. nonfunctional requirements in software engineering, with helpful examples. Then, see how to write both and build ... Continue Reading