Overview of VAST Replication

Replication allows you to maintain copies of protected paths on remote clusters. This can either be done continuously, with synchronous replication, or periodically, with asynchronous replication. In the event of a failure on the protected path, the copy on the remote cluster can be turned into source for the data.

With VAST replication, you can:

Maintain a backup copy of your data for regulatory purposes.
Recover data in the event of disaster.
Fail over to a remote cluster in a disaster event in order to continue operations.
Fail back to the source cluster following failover, if the source cluster is healthy.
Access remote copies of data on remote clusters with read-write access (for synchronous replication)

Asynchronous Replication

VAST asynchronous replication enables you to set up a recurring data replication schedule for disaster recovery or long-term data retention, in which you replicate data from a primary cluster to another cluster or to multiple destination clusters. Destination clusters can reside at remote locations, anywhere in the world. Replicated data can include file systems and buckets (including databases).

VAST asynchronous replication provides capability to fail over, fail back, suspend and resume, and break replication relationships between clusters, as needed to recover data after a failure, corruption or for any other reason.

Asynchronous Replication Concepts and Terminology

In asynchronous replication, the state of data on a path is captured on the source peer by snapshots at points in time. At each point in time, the data on the path that has changed since the last point in time is captured in a snapshot and transferred to destination peers. This applies to files in a file system, objects in an S3 bucket, and VAST Database that are in the protected path.

Clusters in an asynchronous replication configuration where snapshots are copied from cluster to cluster are called replication peers. The configuration of a replication peer is done on one cluster and then mirrored to the other cluster.

During replication, each peer has a role which reflects its part in replication. The role of the peer on which the snapshots are created is called source. The role of a peer to which snapshots are transferred by the source peer is called destination.

The roles of a peer can change as follows:

A cluster acting as a destination peer can become the replication source in a failover. Failover can be graceful, where the original source peer is reachable throughout the process, or ungraceful, if the source peer became unreachable and a decision is made to force a failover from a destination peer, which becomes the source. Any other destination peers (see group replication below) continue to be destination peers. In graceful failover, the former source peer also becomes a destination peer. During graceful failover, both the former source peer and the new source peer are read-only while data is transferred from old source to new source. When failover is complete, the new source peer becomes writeable.
A destination peer can break replication. This means that replication to that destination peer is ceased, although the source peer may continue to replicate to other destination peers. The role of that destination peer becomes standalone and the data that was replicated to that peer becomes writable.

On a destination peer, a read-only replica of the data is stored at a chosen path. This replica is based on the most recent snapshot that was transferred from the source peer. Snapshots that were transferred from the source peer and did not yet expire are stored in the .snapshots directory under the same path.

A protection policy specifies the replication peer and defines the chosen schedule for replication from the source peer to the destination peer. When you create a protection policy for async replication, the policy itself is mirrored to the other peer.

A protected path specifies a local path that it protects through snapshots and/or replication. It can specify one or more replication streams, each of which specifies a protection policy and a remote path on the remote peer specified by the protection policy. The local path is the data path on the source peer that is captured in the snapshots. The remote path in each replication stream is a path on a destination peer to which the data is replicated. The remote path is kept updated to the most recently transferred snapshot and the data is stored as read-only.

The transfer of a snapshot to the destination peer is called the creation of a restore point.

Many to One and One to Many Replication

Data can be replicated from a VAST Cluster to multiple other VAST Clusters and from multiple VAST Clusters to one VAST Cluster. See VAST Cluster Scale Guidelines for maximum limits of replication peers, protected paths and protection policies.

Note
A single path cannot be both the source and the destination of replication at the same time.

Group Replication

You can configure a group of replication peers in a relationship where one peer is the replication source and it replicates data on a given path to the other peers in the group. This configuration enables failover to one of the destination peers. Replication is automatically resumed, from the new source to the old source, as soon as the new source peer has all the data. It is also automatically resumed to the rest of the peers once they are synchronized with the new source.

Scheduled or On-Demand Replication

Asynchronous replication is scheduled through a protection policy that defines the timing and frequency of the replication. There is also an option to replicate any given protected path on demand at any time.

One restore point per destination can be in progress at a time. If a restore point is started while another is in progress, the new restore point waits to start after the earlier one is completed. If an on-demand replication is triggered while there is a queued pending restore point, the most recent on-demand restore point replaces the pending restore point in the queue (it is dropped, whether it was a scheduled restore point or an on-demand restore point). Similarly, if the time for the next scheduled restore point arrives while there is a pending on-demand restore point in the queue, the scheduled restore point is dropped in favor of the pending on-demand restore point.

Asynchronous Replication of VAST Databases

VAST Databases can be asynchronously replicated, like files and S3 buckets, if they reside in paths protected by asynchronous replication. As with files, periodic snapshots are created from the database, and replicated to replication peers as read-only images. Upon failover, they become read-writable. The replication follows the replication policy of the protected path for the frequency of replication.

The following apply to replicated VAST Databases:

Only committed Database transactions are replicated
Semi-sorted projections on the source Database are not replicated
The most recent replicated snapshots of a Database on a replication peer can be queried
The VAST Catalog and audit log are not replicated

Asynchronous Replication of a Global Folder

A global folder is a folder on one cluster that is made accessible to clients of connected peer clusters at a path on each of the peers. This is done using a global access protected path. It is possible to configure asynchronous replication and global access on the same source peer and path. For general information about global access, see Global Access. For specific information about configuring asynchronous replication and global access on the same path, see Configuring Async Replication and Global Access on Shared Paths. Global Access

Synchronous Replication

Synchronous Replication provides resiliency in the case of a disaster (full cluster failure) with no data loss (RPO=0). Replication peers, protected paths and policies are configured as for asynchronous replication (only a single replication peer is used for synchronous replication). For synchronous replication however, data is replicated immediately between source and destination replication peers so that, in the event of a failover, the destination can take over immediately, fully synchronized with source data (no loss). In practice, this means that every write operation to the source path is replicated immediately (synchronously) to the destination peers.

Replication with mTLS Encryption

You can configure the replication connection between two peers to be encrypted using Mutual Transport Layer Security (mTLS). This is known as secure mode and is configured per replication peer.

This can be configured for both synchronous and asynchronous replication.

Replication and the VAST Catalog

If the VAST Catalog is enabled on a cluster that is a replication target for a protected path, the Catalog will update according to changes in the protected path while the target is connected to the source cluster. These updates are based on the periodic snapshots of the protected path that are made on the source, and replicated on the target.

The Catalog will remain updated with changes to the protected path in the event of different failover and re-attach scenarios as follows:

Graceful failover to standalone. If the target cluster is disconnected from the source in a graceful failover, the protected path is synchronized with the source cluster (and a snapshot of this state copied to the cluster). The Catalog on the new source reflects these changes in the protected path.
Ungraceful failover to standalone. If the target cluster disconnects from the source in an ungraceul failover, the target cluster reverts the protected path to the last full snapshot that was replicated to it (and discards any changes after this snapshot). The Catalog is also updated to the state of the protected path based on the last snapshot.
In both the above cases, the protected path on the old target cluster becomes read-write, and changes can be made locally. The Catalog is updated to reflect these changes.
Re-attach a cluster. If the original source cluster is re-attached to the (now) standalone cluster, and becomes the source cluster again, the standalone cluster reverts to being the target, and is synchronized with the source (changes made while it was standalone are discarded). The Catalog on this cluster is updated to reflect these changes in the protected path on the target cluster. If the standalone cluster becomes the source, and re-attaches to the former source cluster, it (the former source, now target) is synchronized with the new source, and its Catalog is updated accordingly.
Delete a stream. If a protected path stream is deleted, the data in the replicated path remains valid on the target cluster. The target cluster reverts the path to the last full snapshot that was replicated to it (and discards any changes after this snapshot). The Catalog is also updated to the state of the path based on the last snapshot. The path on the target becomes read-write, and changes to it are updated in the Catalog.
Replication to multiple targets. If a protected path is replicated to several target clusters, the behavior of the protected path on each target (and its Catalog) is as for the case of a single target. If the connection to one target is removed (whether graceful or ungraceful), the connection to the other targets is unaffected (including their Catalogs). If one of the targets is made the source, each of the other clusters (including the former source) is made a target, and synchronizes to the new source. The Catalogs on these clusters are updated to reflect the state of the protected path on each of them.

Note
The above scenarios assume all replication clusters are running version 5.4 . If this is not the case, some scenarios are blocked by VMS, and in others the Catalog on the target cluster is disabled.

Replication with Multiple Tenants

You can replicate protected paths between clusters and between tenants on these clusters. For example, a protected path, /path, belonging to Tenant A can be replicated to path /path, belonging to Tenant B, on a remote cluster. This applies to synchronous and asynchronous replication.

These restrictions apply when replicating between tenants:

If Tenant A replicates a protected path to Tenant B on a remote cluster, it cannot then replicate another path from Tenant A to Tenant C on the same remote cluster (that is, Tenant A cannot have replicated protected paths to more than one tenant on the same remote cluster). It can, however, replicate protected paths to Tenant C (or any other tenant) on a different remote cluster. Similarly, Tenant A can replicate additional protected paths to Tenant B on the same remote cluster.
This applies for all protocols.
If you are upgrading your cluster to this version from an earlier version which permitted a tenant to replicate protected paths to more than one tenant on the same remote cluster, existing replication streams to multiple tenants are maintained after the upgrade (an alarm message may appear, however). You cannot replicate additional protected paths to these tenants on this remote cluster, but can replicate protected paths to other tenants on the cluster.
If Tenant A replicates a path from Cluster A to Cluster B, and from Cluster B to C, S3 access keys and identity policies, which are replicated in order to support S3 access to replicated data, are copied from A to B, and are not then copied to C, but access keys and identity policies that are local to B are copied to Cluster C.
Note
For more information about replication of S3 access permissions, see S3 Access to Replicated Data.

Failover Capabilities

VAST Cluster provides the ability to change the peer roles in a replication configuration where the backed up data on a destination peer becomes writeable and the previous source peer ceases to be the replication source.

Seamless failover, where clients can continue connecting to the data path over the same mount point after a failover, is supported for NFSv3 clients through the configuration of globally synchronized views.

For information about failover, see Disaster Recovery.Disaster Recovery

Switching between Synchronous and Asynchronous Replication

You can convert the replication scheme for a protected path from asynchronous to synchronous, or from synchronous to asynchronous.

You can only convert an asynchronous replication for a path to synchronous if there is at most a single replication peer (synchronous replication permits only a single replication peer, whereas asynchronous replication permits more).

Synchronous to Asynchronous Replication

When switching from synchronous to asynchronous replication, the the destination path becomes read-only. A protection policy must be associated with the source protected path, indicating the replication interval.

Asynchronous to Synchronous Replication

When switching from asynchronous to synchronous replication, the destination path is first synchronized with the source, during which time it is in the state 'Becoming Synchronous'. Once synchronized, the destination path becomes read-write.

You can only switch a protected path for S3 buckets to synchronous replication. If there are other protocols for the path, it is not possible to switch.

Replicating S3 Bucket Configuration

An optional feature called bucket replication automatically creates S3 buckets on replicated protected paths with the same properties as buckets on the source path. For details of this feature and how to enable it, see S3 Access to Replicated Data.

Data Protection Limitations and Exclusions

A protected path cannot simultaneously be replicated synchronously to one replication peer and asynchronously to another.
A protected path cannot simultaneously be replicated asynchronously to one replication peer and replicated for global access to another.