Overview of VAST Replication

Replication allows you to maintain copies of protected paths on remote clusters. This can be either be done continuously, with synchronous replication, or periodically, with asynchronous replication. In the event of a failure on the protected path, the copy on the remote cluster can be turned into source for the data.

With VAST replication, you can:

Maintain a backup copy of your data for regulatory purposes.
Recover data in the event of disaster.
Fail over to a remote cluster in a disaster event in order to continue operations.
Fail back to the source cluster following failover, if the source cluster is healthy.
Access remote copies of data on remote clusters with read-write access (for synchronous replication)

Asynchronous Replication

VAST asynchronous replication enables you to set up a recurring data replication schedule for disaster recovery or long-term data retention, in which you replicate data from a primary cluster to another cluster or to multiple destination clusters. Destination clusters can reside at remote locations, anywhere in the world. Replicated data can include file systems and buckets (including databases).

VAST asynchronous replication provides capability to fail over, fail back, suspend and resume, and break replication relationships between clusters, as needed to recover data after a failure, corruption or for any other reason.

Asynchronous Replication Concepts and Terminology

In asynchronous replication, the state of data on a path is captured on the source peer by snapshots at points in time. At each point in time, the data on the path that has changed since the last point in time is captured in a snapshot and transferred to destination peers. This applies to files in a file system, objects in an S3 bucket, and VAST Database that are in the protected path.

Clusters in an asynchronous replication configuration where snapshots are copied from cluster to cluster are called replication peers. The configuration of a replication peer is done on one cluster and then mirrored to the other cluster.

During replication, each peer has a role which reflects its part in replication. The role of the peer on which the snapshots are created is called source. The role of a peer to which snapshots are transferred by the source peer is called destination.

The roles of a peer can change as follows:

A cluster acting as a destination peer can become the replication source in a failover. Failover can be graceful, where the original source peer is reachable throughout the process, or ungraceful, if the source peer became unreachable and a decision is made to force a failover from a destination peer, which becomes the source. Any other destination peers (see group replication below) continue to be destination peers. In graceful failover, the former source peer also becomes a destination peer. During graceful failover, both the former source peer and the new source peer are read-only while data is transferred from old source to new source. When failover is complete, the new source peer becomes writeable.
A destination peer can break replication. This means that replication to that destination peer is ceased, although the source peer may continue to replicate to other destination peers. The role of that destination peer becomes standalone and the data that was replicated to that peer becomes writable.

On a destination peer, a read-only replica of the data is stored at a chosen path. This replica is based on the most recent snapshot that was transferred from the source peer. Snapshots that were transferred from the source peer and did not yet expire are stored in the .snapshots directory under the same path.

A protection policy specifies the replication peer and defines the chosen schedule for replication from the source peer to the destination peer. When you create a protection policy for async replication, the policy itself is mirrored to the other peer.

A protected path specifies a local path that it protects through snapshots and/or replication. It can specify one or more replication streams, each of which specifies a protection policy and a remote path on the remote peer specified by the protection policy. The local path is the data path on the source peer that is captured in the snapshots. The remote path in each replication stream is a path on a destination peer to which the data is replicated. The remote path is kept updated to the most recently transferred snapshot and the data is stored as read-only.

The transfer of a snapshot to the destination peer is called the creation of a restore point.

Many-to-One and One-to-Many Replication

Data can be replicated from a VAST Cluster to multiple other VAST Clusters and from multiple VAST Clusters to one VAST Cluster. See VAST Cluster Scale Guidelines for maximum limits of replication peers, protected paths and protection policies.

Note
A single path cannot be both the source and the destination of replication at the same time.
In one to one replication where one of the two peer clusters is running an earlier version of VAST Cluster than 4.7, it is possible to configure more than one protected path on the same path. In this case, each protected path can have only one replication stream. Two protected paths must not replicate the path to the same replication peer. Also, failover is not allowed when there is more than one protected path on the same path.
When the two peer clusters are both running VAST Cluster 4.7 or later only one protected path per local path is supported, while the protected path can have multiple replication streams, each replicating to a different peer.

Group Replication

You can configure a group of replication peers in a relationship where one peer is the replication source and it replicates data on a given path to the other peers in the group. This configuration enables failover to one of the destination peers. Replication is automatically resumed, from the new source to the old source, as soon as the new source peer has all the data. It is also automatically resumed to the rest of the peers once they are synchronized with the new source.

Group replication is supported starting with VAST Cluster 4.7. Clusters running earlier versions of VAST Cluster cannot be group members.

Scheduled or On-Demand Replication

Asynchronous replication is scheduled through a protection policy that defines the timing and frequency of the replication. There is also an option to replicate any given protected path on demand at any time.

One restore point per destination can be in progress at a time. If a restore point is started while another is in progress, the new restore point waits to start after the earlier one is completed. If an on-demand replication is triggered while there is a queued pending restore point, the most recent on-demand restore point replaces the pending restore point in the queue (it is dropped, whether it was a scheduled restore point or an on-demand restore point). Similarly, if the time for the next scheduled restore point arrives while there is a pending on-demand restore point in the queue, the scheduled restore point is dropped in favor of the pending on-demand restore point.

Asynchronous Replication of VAST Databases

VAST Databases can be asynchronously replicated, like files and S3 buckets, if they reside in paths protected by asynchronous replication. As with files, periodic snapshots are created from the database, and replicated to replication peers as read-only images. Upon failover, they become read-writable. The replication follows the replication policy of the protected path for the frequency of replication.

The following apply to replicated VAST Databases:

Only committed database transactions are replicated
Semi-sorted projections on the source database are replicated
The most recent replicated snapshots of a database on a replication peer can be queried
The VAST Catalog and audit log are not replicated

Multi-tenant replication currently does not support database replication.

Asynchronous Replication of a Global Folder

A global folder is a folder on one cluster that is made accessible to clients of connected peer clusters at a path on each of the peers. This is done using a global access protected path. It is possible to configure asynchronous replication and global access on the same source peer and path. For general information about global access, see Global Access. For specific information about configuring asynchronous replication and global access on the same path, see Asynchronous Replication of a Global Folder.

Synchronous Replication

Synchronous Replication provides resiliency in the case of a disaster (full cluster failure) with no data loss (RPO=0). Replication peers, protected paths and policies are configured as for asynchronous replication (only a single replication peer is used for synchronous replication). For synchronous replication however, data is replicated immediately between source and destination replication peers so that, in the event of a failover, the destination can take over immediately, fully synchronized with source data (no loss). In practice, this means that every write operation to the source path is replicated immediately (synchronously) to the destination peers.

Replication with mTLS Encryption

You can configure the replication connection between two peers to be encrypted using Mutual Transport Layer Security (mTLS). This is known as secure mode and is configured per replication peer.

Secure mode can be configured for both synchronous and asynchronous replication.

Replication with Multiple Tenants

You can replicate protected paths between clusters and between tenants on these clusters. For example, a protected path, /path, belonging to Tenant A can be replicated to path /path, belonging to Tenant B, on a remote cluster. This applies to synchronous and asynchronous replication.

These restrictions apply when replicating between tenants:

If Tenant A replicates a protected path to Tenant B on a remote cluster, it cannot then replicate another path from Tenant A to Tenant C on the same remote cluster (that is, Tenant A cannot have replicated protected paths to more than one tenant on the same remote cluster). It can, however, replicate protected paths to Tenant C (or any other tenant) on a different remote cluster. Similarly, Tenant A can replicate additional protected paths to Tenant B on the same remote cluster.
This applies for all protocols.
If you are upgrading your cluster to this version from an earlier version which permitted a tenant to replicate protected paths to more than one tenant on the same remote cluster, existing replication streams to multiple tenants are maintained after the upgrade (an alarm message may appear, however). You cannot replicate additional protected paths to these tenants on this remote cluster, but can replicate protected paths to other tenants on the cluster.
If Tenant A replicates a path from Cluster A to Cluster B, and from Cluster B to C, S3 access keys and identity policies, which are replicated in order to support S3 access to replicated data, are copied from A to B, and are not then copied to C, but access keys and identity policies that are local to B are copied to Cluster C.
Note
For more information about replication of S3 access permissions, see S3 Access to Replicated Data.

Failover Capabilities

VAST Cluster provides the ability to change the peer roles in a replication configuration where the backed up data on a destination peer becomes writeable and the previous source peer ceases to be the replication source.

Seamless failover, where clients can continue connecting to the data path over the same mount point after a failover, is supported for NFSv3 clients through the configuration of globally synchronized views.

For information about failover, see Disaster Recovery.

Switching between Synchronous and Asynchronous Replication

You can convert the replication scheme for a protected path from asynchronous to synchronous, or from synchronous to asynchronous.

You can only convert an asynchronous replication for a path to synchronous if there is at most a single replication peer (synchronous replication permits only a single replication peer, whereas asynchronous replication permits more).

Synchronous to Asynchronous Replication

When switching from synchronous to asynchronous replication, the the destination path becomes read-only. A protection policy must be associated with the source protected path, indicating the replication interval.

Asynchronous to Synchronous Replication

When switching from asynchronous to synchronous replication, the destination path is first synchronized with the source, during which time it is in the state 'Becoming Synchronous'. Once synchronized, the destination path becomes read-write.

You can only switch a protected path for S3 buckets to synchronous replication. If there are other protocols for the path, it is not possible to switch.

Replicating S3 Bucket Configuration

An optional feature called bucket replication automatically creates S3 buckets on replicated protected paths with the same properties as buckets on the source path. For details of this feature and how to enable it, see S3 Access to Replicated Data.

Data Protection Limitations

A protected path cannot simultaneously be replicated synchronously to one replication peer and asynchronously to another.
A protected path cannot simultaneously be replicated asynchronously to one replication peer and replicated for global access to another.