Data Protection Overview

VAST Cluster enables you to protect data on your cluster with the following backup and recovery features:

Overview of Data Protection

In the VAST Cluster there are several methods to backup and protect data in the cluster. These include

Backups and snapshots of a path at a specific moment in time, to a local or remote cluster folder
Replication of data from one cluster to another, either synchronously (continuously) or asynchronously (at regular intervals)

Snapshots

In local backup, snapshots are taken periodically, or manually, and retained on the cluster for a configurable period. Data that was backed up with snapshots can be accessed in a read-only virtual directory using a client connection to the path protected by the snapshot. Data can be restored by copying from the virtual directory to another directory. You can also restore data from any snapshot of the path, on the same cluster or on a remote cluster that is a replication peer of the cluster.

This feature enables you to create and maintain immutable backups. Snapshots and protection policies can be marked indestructible, in which case they are protected from accidental or malicious deletion and modification. Modifying and deleting indestructible policies and snapshots requires a token-based authorized unlocking of the indestructibility mechanism.

For more information about snapshots, see Snapshots and Local Backup.

Supported Protocols

Snapshots are supported for S3, NFS, and SMB protocols.

Replication

Replication allows you to maintain copies of protected paths on remote clusters. This can be either be done continuously, with synchronous replication, or periodically, with asynchronous replication. In the event of a failure on the protected path, the copy on the remote cluster can be turned into source for the data.

Asynchronous Replication

Asynchronous replication allows you to replicate data from a path in a primary cluster to another cluster or to multiple destination clusters. Destination clusters can reside at remote locations, anywhere in the world. Replicated data can include file systems, buckets, and Databases. Data is replicated as read-only snapshots of the path, which are copied to the destination clusters. Replications are repeated periodically, according to a schedule you set up.

Asynchronous replication is configured in the following way:

You define replication peers (remote destination clusters) in the source cluster, which will receive the snapshots from the primary cluster
You designate paths (folders) to be replicated as protected paths on the source cluster. Paths can be folders (NFS, SMB) or buckets (S3).
You create a protection policy, indicating how snapshots of the protected path are taken, the frequency, and the retention time.
Replication starts once the policy is associated with the protected path, and continues according to the settings in the protection policy.
Clients can access data in the source path, or read-only snapshots of it on remote destination clusters.

Response and recovery from a failure on a protected path on the source cluster follows these steps:

When a failure or other fault is detected in the source path, rendering it unusable, a cluster admin user manually initiates a failover to the latest snapshot in the destination replication clusters. This snapshot in the destination cluster then becomes the read-write source for the path, in place of the original path.
Clients must now access the destination cluster (now the new source cluster) instead of the (original) source cluster.

For more information about asynchronous replication, see Overview of VAST Replication and Disaster Recovery.

Supported Protocols

Asynchronous Replication supports S3, NFS, SMB and VAST Database protocols.

Synchronous Replication

Synchronous Replication provides resiliency in the case of a disaster (full cluster failure) with no data loss (RPO=0). Synchronous replication uses replication peers and protected paths. For synchronous replication, however, data is replicated immediately between primary and secondary replication peers so that, in the event of a failover, the secondary can take over immediately, fully synchronized to with primary data (no loss). In practice, this means that every write operation to the primary path is replicated immediately (synchronously) to the secondary peers.

In addition, the replicated data on the destination cluster is available for continuous read-write access as long as the replication connection with the source cluster is active.

You configure synchronous replication in the following way:

You define a single replication peer, to which the replicated data will be copied
You designate S3 buckets to be replicated as protected paths.
Replication starts once the protected path is configured. Initially, the data from the source destination is copied to the destination cluster. Thereafter, any write operation on either source or destination is replicated to the other.
Clients can access either the source or destination clusters, both of which are read-write, and write-synchronized to each other.

Response and recovery from a failure on a protected path on the source cluster follows these steps:

When the source path becomes unavailable, an admin disconnects the destination from the source, and then designates it (the destination cluster) as the new source. The failed source cluster is blocked until the connection with the replication cluster is restored and data is synchronized with it.
Clients must access the destination cluster (now designated the source).

For more information about synchronous replication, see S3 Synchronous Replication.

Supported Protocols

With VAST Cluster 5.2, synchronous replication is supported for S3 buckets.

Replication with Multiple Tenants

You can replicate protected paths between clusters and between tenants on these clusters. For example, a protected path, /path, belonging to Tenant A can be replicated to path /path, belonging to Tenant B, on a remote cluster. This applies to synchronous and asynchronous replication.

These restrictions apply when replicating between tenants:

If Tenant A replicates a protected path to Tenant B on a remote cluster, it cannot then replicate another path from Tenant A to Tenant C on the same remote cluster (that is, Tenant A cannot have replicated protected paths to more than one tenant on the same remote cluster). It can, however, replicate protected paths to Tenant C (or any other tenant) on a different remote cluster. Similarly, Tenant A can replicate additional protected paths to Tenant B on the same remote cluster.
This applies for all protocols.
If you are upgrading your cluster to this version from an earlier version which permitted a tenant to replicate protected paths to more than one tenant on the same remote cluster, existing replication streams to multiple tenants are maintained after the upgrade (an alarm message may appear, however). You cannot replicate additional protected paths to these tenants on this remote cluster, but can replicate protected paths to other tenants on the cluster.
If Tenant A replicates a path from Cluster A to Cluster B, and from Cluster B to C, S3 access keys and identity policies, which are replicated in order to support S3 access to replicated data, are copied from A to B, and are not then copied to C, but access keys and identity policies that are local to B are copied to Cluster C.
Note
For more information about replication of S3 access permissions, see S3 Access to Replicated Data.

Switching between Synchronous and Asynchronous Replication

You can convert the replication scheme for a protected path from asynchronous to synchronous, or from synchronous to asynchronous.

You can only convert an asynchronous replication for a path to synchronous if there is at most a single replication peer (synchronous replication permits only a single replication peer, whereas asynchronous replication permits more).

Synchronous to Asynchronous Replication

When switching from synchronous to asynchronous replication, the the destination path becomes read-only. A default protection policy is associated with the source protected path, indicating the replication interval. You can modify the policy to change the replication interval.

Asynchronous to Synchronous Replication

When switching from asynchronous to synchronous replication, the destination path is first synchronized with the source, during which time it is in the state 'Becoming Synchronous Replication'. Once synchronized, the destination path becomes read-write.

You can only switch a protected path for S3 buckets to synchronous replication. If there are other protocols for the path, it is not possible to switch.

Data Protection Limitations and Exclusions

A protected path cannot simultaneously be replicated synchronously to one replication peer and asynchronously to another.
A protected path cannot simultaneously be replicated asynchronously to one replication peer and shared for global access to another.
A protected path using synchronous replication can only contain S3 buckets. You cannot use synchronous replication for paths that are exposed to any other protocols.
A protected path using synchronous replication can be replicated to only one peer.

S3 Backups

In backup to S3, snapshots (which can also be retained for local backup) are copied to an AWS S3 bucket or a custom target location that is accessible using S3 operations. Backed up data can be accessed in a read only virtual directory via client connection to the root of the cluster's element store. Data can be restored by simply copying from the virtual directory to another directory.

For more information about S3 Backups, see Backup to S3.