Rack Resiliency

Prev Next

VAST Clusters can now be configured for rack resiliency, to ensure the availability of the cluster in the event of a failure of a rack of DBoxes.

Resiliency is configured by assigning racks as distinct failure domains, with DBox RAID stripes distributed across all the domains. In the event of a failure in one domain, the remaining domains continue to provide full data availability while at the same time rebuilding the stripes from the failed domain (if there is sufficient storage on these domains for this).

Resiliency is configured and enabled during cluster deployment (in VAST Cluster Install or by using the cluster create command of VAST CLI), and cannot be disabled after deployment.

Prerequisites

  • A minimum of seven racks. A minimum of seven failure domains must be configured, with each domain containing one DBox rack.

Limitations

  • This feature is not supported with NDU from previous cluster versions. That is, it can only be configured for a new installation of  VAST Cluster.

  • If rack resiliency is enabled on the cluster, all racks must satisfy these conditions:

    • The number of DBoxes in the rack cannot exceed the number in any other rack by more than one.

      For example, if a cluster has racks with 10 DBoxes in each, it is possible to add a new rack with 9, 10, or 11 DBoxes, but not with 12 DBoxes.

    • The total available DBox capacity in a rack cannot be more than twice the available capacity in any other rack.

      For example, if a cluster has racks with capacity 10TB each, a new rack with 15TB can be added, but not a rack with 4TB or 30TB.

    If resiliency is enabled, additional racks can only be added to an existing cluster if they satisfy these requirements.

Configuring Rack Resiliency During VAST Cluster Install

Follow the steps for VAST Cluster Install in the VAST Cluster Software Install Guide to define racks (domains) and DBoxes. In the General section, toggle the Enable Rack Level Resiliency option on before starting the install process.

Configuring Rack Resiliency During Cluster Deployment using the VAST CLI

You can configure rack resiliency during cluster creation and deployment using the VAST CLI.

  1. Prepare a resiliency configuration file, in JSON format, that lists the racks and failure domains in your cluster. Follow the format of the example provided (see Sample Rack Resiliency Failure Domain Configuration File).

  2. Add this option when you run the cluster create VAST CLI command:

    --rack-config CONFIG_FILE

    where CONFIG_FILE is the full path to the resiliency configuration file.

When the cluster is created with this option, the rack resiliency feature is enabled. It cannot later be disabled.

Sample Rack Resiliency Failure Domain Configuration File

This configuration file snippet defines two racks: one with two units, and one with a single unit.

The first rack, Rack_1, has two DBox units, U1 and U2; the second rack, Rack_2, has a single DBox, U3. The DBoxes in Rack_1 (U1 and U2) are configured as one failure domain, and the DBox in Rack_2 is configured as a second domain.

At least seven racks must be defined in the file.

{
   "racks": [
    {
      "name": "Rack_1",
      "units": [
        {
          "rack_unit": "U1",
          "index_in_rack": 1,
          "ips": [
            "192.168.1.1",
            "192.168.1.2",
            "192.168.1.3",
            "192.168.1.4"
          ]
        },
        {
          "rack_unit": "U2",
          "index_in_rack": 2,
          "ips": [
            "192.168.2.1",
            "192.168.2.2",
            "192.168.2.3",
            "192.168.2.4"
          ]
        }
      ]
    },
    {
      "name": "Rack_2",
      "units": [
        {
          "rack_unit": "U3",
          "index_in_rack": 1,
          "ips": [
            "192.168.3.1",
            "192.168.3.2",
            "192.168.3.3",
            "192.168.3.4"
          ]
        }
      ]
    }
  ]
}
...

Updating Rack Resiliency on an Existing Cluster During Expansion

When you add additional racks and DBoxes to the cluster, they are included automatically in the Rack Resiliency configuration (if it is enabled on the cluster) as new domains.