SyncEngine User Guide

Prev Next

Overview

SyncEngine is a distributed data migration platform designed for enterprise-scale data operations. It provides seamless data transfer between various data sources and the VAST Data platform, with comprehensive monitoring and management capabilities.

Key Features

  • Multi-Protocol Data Migration: Seamless transfer between POSIX filesystems and S3-compatible storage to VAST

  • Distributed Architecture: Scalable worker-based processing across multiple hosts

  • Real-time Monitoring: Comprehensive metrics and progress tracking

  • Enterprise-Grade: Fault-tolerant design with automatic recovery and retry mechanisms

  • Simple Management: Command-line interface and web-based monitoring

Architecture

SyncEngine consists of two main components:

  1. Control Plane: Central management component that handles scheduling, coordination, and monitoring

  2. Workers: Distributed processing units deployed across multiple hosts that handle actual data migration

The control plane manages the overall migration process, distributes tasks to workers, and collects metrics and progress information. Workers are deployed across multiple hosts and handle the actual data transfer operations between source and destination storage systems.

Network Requirements: Worker hosts must have network connectivity to both source and destination storage systems on the appropriate ports: - NFS: Port 2049 (TCP/UDP) - S3: Port 80 (HTTP) or 443 (HTTPS)

Sizing and Performance

This section will be updated with additional details as more testing occurs.

As with any performance-sizing exercise, there are many factors to consider.  This doc will focus on the basics.

First off, if you will be migrating more than one S3 bucket, NFS Export, or Lustre/GPFS filesystem, please look at each separately before proceeding.  They can be run concurrently, but it is best to review each independently, as they will be migrated as separate migration tasks/jobs by SyncEngine.

Dataset

  • How much data will you be migrating? (eg: in TB, PB, etc)

  • (Approximately) How many files/objects are contained within the migration set?

  • For S3 migrations, is the namespace a 'flat' bucket, or does it use the '/' key separator?

    • If using '/' separator, approximately how many objects per prefix?

Source System Characteristics

  • Is this an object store (S3) or a POSIX filesystem?

  • What is the expected performance to read from the source system

    • B/W

    • IOPS/md-iops

  • If it is an S3 system, is there a way to load-balance client connections across multiple "nodes" on the source system?

    • If so, does the load balancer introduce a bottleneck? (if so: make note on the bottleneck)

  • How much B/W / How many IOPs can be 'safely' consumed by migration without impacting any production workloads?

Destination (VAST) Characteristics

  • How many C & D Nodes?

  • Network connectivity

    • eg: How fast are the connections between the VAST CNodes and the network segment where your migration hosts are connected?  Consider whether there are constraints on uplinks between networks.

  • What is the expected performance (consult VAST for guidance if you do not know)

    • Ingest (write) Throughput

    • Ingest (write) IOPS

  • How many VAST CNodes will you allocate to the Pool used for migration?

  • How much B/W / How many IOPs can be 'safely' consumed by migration without impacting any production workloads?

Sizing Methodology

Before choosing the right number of migration hosts (known as 'workers'), it’s important to determine the bottlenecks from the previous exercise.

Generally speaking, we're looking to identify the 'lowest' ceiling in terms of bottlenecks.  For example, if your source system is constrained by network connectivity to 100 GbE (~11-12 GByte/sec), then that may be your bottleneck in general.

Once you have identified your theoretical ceiling, it’s time to size for workers.  Note that the migration type (Posix vs S3) will differ in terms of per-worker (host) capabilities.  For that reason, this section is split into two.

POSIX Sizing

When migrating from a Posix system, each worker 'mounts' both the source and destination systems, once per 'dataset'.  Because of this, there can be 'per mount' bottlenecks that will impact the performance.  If you are migrating a single POSIX dataset, this effectively means that each host has a ceiling.  If you are migrating multiple POSIX datasets, then multiple jobs can allow for a higher ceiling to be achieved (because each host will have multiple source/destination mounts).

Because there can be discrepancies between the different types of source POSIX systems, this doc will focus (for now) only on source systems that expose the NFSv3 protocol using TCP (not RDMA) for data transport.

Limits per host will be: * ~2 GB/sec with vanilla NFS mount-opts * (up to) the limit of a single network interface with the vast-nfs driver.

If you need more information related to vast-nfs and its performance-enhancing NFS mount options, please refer to some of the following resources:

Object (S3) Sizing

The S3 API rides on top of an HTTP-based protocol.  Therefore, migration hosts do not need to 'mount'.  Each I/O becomes a separate connection, and so it can be simpler to size for (and potentially more performant).  Generally speaking, this means that with large objects (eg, at least 5MB), you can expect to reach the limits of your network hardware on the host.  We will update this documentation with a table describing additional details on how to perform sizing as more testing is performed on a variety of hardware systems.

Deployment Considerations

Network Ports

SyncEngine uses the following network ports:

User/Client-facing ports:

Service

Port

Protocol

Description

Control Plane API

5009

HTTP

Main API endpoint

mscli webUI

8888

HTTP

(optional) GUI for SyncEngine

Prometheus Exporter

8080

HTTP

Metrics endpoint

Grafana

3009

HTTP

Web dashboard

Internal services:

Service

Port

Protocol

Description

pgAdmin

5050

HTTP

Database management

RedisInsight

5540

HTTP

Redis management

PostgreSQL

5432

TCP

Database

Redis

6379

TCP

Coordination

Prometheus

9991

HTTP

Metrics collection

workers need to talk to: * redis (hosted on controller) * controller needs to access 8000 -> 8100 which is running on all workers (for prometheus scraping) * workers need to access 5009, which is listening on controller.

Prerequisites

System Requirements

Ideally, this is deployed on multiple hosts or VMS.  One for the control plane node, and 2 or more for worker nodes.  

This doc was written for RHEL/Rocky Linux; SyncEngine can also be deployed on Ubuntu/Debian variants with slightly modified instructions.

  • Memory: Minimum 4GB RAM per host.

  • Storage: Minimum 20GB free disk space per host.

  • Network: Connectivity between the control plane and the worker hosts.

  • Access: Root or sudo access on all hosts; admin access to the VAST Management Server (VMS).

Required Software

This was tested on Rocky Linux v 9.5; older distros should work, so long as podman can be installed.

  • Python 3.6 or higher

  • curl and tar utilities

Deploying

Information You Will Need

Before starting the installation, gather the following information:

Network Information

  • Control Node: IP address or hostname of the SyncEngine control plane host

  • Worker Nodes: IP addresses or hostnames of all worker hosts

  • Source Storage: IP address or hostname of the source file server (NFS) or object store (S3)

  • VAST VMS: IP address or hostname of the VAST Management System

  • VAST VIP Pool: IP range for VAST VIP pool (e.g., 172.25.1.1-172.25.1.32) or the VAST DNS name to be used as the destination for migration.

Authentication Information

  • VAST Admin User: Username for VAST VMS access (typically admin)

  • VAST Admin Password: Password for VAST VMS access

Migration-Specific Information

  • Source:

    • Export/path on the source filesystem or S3 bucket to migrate

    • NFS Export/Export policy configured for (read-only) nosquash access to the source export.

    • Access/Secret Key with read access to everything within the Bucket (s3)

  • VAST:

    • Path on VAST filesystem or S3 bucket for migration destination

Installation

Unless otherwise noted, all steps below are executed on the Controller host.

Step 1: Install Prerequisites

1. Install EPEL Repository and ClusterShell

# Install EPEL repository first (required for clustershell)
sudo yum clean all && sudo yum install -y epel-release

# Install clustershell for parallel operations
sudo yum install -y clustershell

2. Configure Passwordless SSH

# Generate SSH key on control plane host (if not exists)
ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa -N ""

# This will copy the pubkey to authorized_keys on controller
ssh-copy-id localhost
# Copy public key to worker hosts

ssh-copy-id <worker-host-1-ip>
ssh-copy-id <worker-host-2-ip>
# ... repeat for all worker hosts

# Test passwordless SSH to each worker
ssh <worker-host-1-ip> "echo 'SSH connection successful'"
ssh <worker-host-2-ip> "echo 'SSH connection successful'"
# ... test all worker hosts

3. Configure and Test ClusterShell

#setup clush group for workers , this appends to the existing config
#for larger groups, use more compact notation, refer to clustershell doc's online.
sudo tee -a /etc/clustershell/groups.d/local.cfg > /dev/null << 'EOF'
workers: <worker-host-1-ip>,<worker-host-2-ip>
EOF
# Ensure ClusterShell does not prompt for host key confirmation
# Simply append the required ssh_options line
echo 'ssh_options: -oStrictHostKeyChecking=no' | sudo tee -a /etc/clustershell/clush.conf >/dev/null
# Test clustershell connectivity
clush -g workers "echo 'Clustershell working on' \$(hostname)"

4. Install Required Packages Using ClusterShell

NOTE: the following commands require that passwordless 'sudo' is configured on the worker hosts.  If it is not, you will need to do so manually on each host before you proceed.  Your situation may vary on how to do this, but here is an example:

# From the controller, configure passwordless sudo on the first worker for the current user
ssh -t <worker-host-1-ip> "user=\$(id -un); echo \"\${user} ALL=(ALL) NOPASSWD:ALL\" | sudo tee /etc/sudoers.d/\${user} >/dev/null && sudo chmod 0440 /etc/sudoers.d/\${user} && sudo visudo -c >/dev/null"
#get epel-release installed on workers.
clush -w @workers "sudo yum clean all && sudo yum install -y epel-release"

# Install required packages on ALL hosts (controller + workers)

clush -w localhost,@workers "sudo yum install -y python3 python3-pip python3-dotenv jq podman podman-compose nfs-utils"

Verify Prerequisites

# Verify installations on all hosts
clush -w localhost,@workers -b "python3 --version && podman --version && jq --version"

#output should match on all hosts if they are the same OS:

---------------
10.143.14.[101,103,105] (3)
---------------
Python 3.9.21
podman version 5.4.0
jq-1.6

Step 2: Configure SELinux

# Disable SELinux on all hosts (required for container operations)
clush -w localhost,@workers "sudo setenforce 0"

# Make SELinux disabled permanent on all hosts
clush -w localhost,@workers "sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config"

# Verify SELinux is disabled on all hosts
clush -w localhost,@workers "getenforce"
# Should return: Disabled

Step 3: Download SyncEngine

Working Directory: Make sure you are still on the controller-host

#make a directory on all hosts (controller + workers) for downloaded artifacts.
clush -w localhost,@workers "mkdir -p ~/syncenginesetup"

#cd to it on the controller-host
cd ~/syncenginesetup

Online Download (Internet Access Required)

# Download ms, a script to deploy and manage packages and services for SyncEngine
# obtain the URL from VAST, contact your SE.

MSURL="https://url.goes.here/ms"

curl -L -o ms ${MSURL}

Offline Download (No Internet Access)

For environments without internet access, you'll need to transfer the SyncEngine bundle to the target systems.

On a system with internet access:

# Download the SyncEngine bundle
BUNDLE_URL="https://url.goes.here/Vastdata_MetaSpace_v1.1.0.tar.gz"

curl -L -o Vastdata_MetaSpace_v1.1.0.tar.gz "${BUNDLE_URL}"

# Download ms , a script to deploy and manage packages and services for SyncEngine
MSURL="https://url.goes.here/ms"

curl -L -o ms ${MSURL}

# Transfer files to air-gapped system using your preferred method:
# - USB drive
# - Internal network file transfer
# - Secure file transfer protocol
# Make sure to place them in ~/syncenginesetup

Step 4: Install Control Plane & tools

(Still on the control-host, in the ~/syncenginesetup directory)

# Make ms script executable
chmod +x ms

#copy all artifacts to workers. May take a minute or so depending on networking between the hosts and the number of workers.
clush -g workers -c ~/syncenginesetup/*


 #copy ms script to executable path on controller-host and all workers.

clush -w localhost,@workers "sudo cp ~/syncenginesetup/ms /usr/local/bin/"

# Verify the script is able to run
clush -w localhost,@workers -b "ms --help"

Online Installation

BUNDLE_URL="https://url.goes.here/Vastdata_MetaSpace_v1.1.0.tar.gz"
# Install SyncEngine control plane
ms install control "${BUNDLE_URL}$ ~/syncengine_installation"

Offline Installation

# Install SyncEngine control plane from local bundle , this will create a ~/syncengine_installation directory
ms install control ~/syncenginesetup/Vastdata_MetaSpace_v1.1.0.tar.gz ~/syncengine_installation


#Generally accept defaults

What happens during installation: 1. Prerequisites verification 2. Bundle download/extraction (online) or local extraction (offline) 3. Container image loading 4. Service startup

Verify Control Plane Installation

# Check if control plane is running
curl http://localhost:5009/healthz

# Check container status
podman ps

# Expected output should show running containers:
# - redis
# - prometheus
# - meta_control_db
# - grafana
# - pgadmin
# - meta_control

Install mscli

mscli is a sidecar CLI tool (static Linux binary) that wraps the SyncEngine REST API to provide a convenient interface suitable for writing scripts.  It also exposes an experimental UI that can manage aspects of SyncEngine.  Note that it is not specifically required; however, it may be preferred to use the RESTAPI directly via curl or other tools.

For the time being, request a URL or binary from VAST.

#set permissions and copy mscli to executable path
chmod +x mscli && sudo cp mscli /usr/local/bin/

Create a .yaml file for mscli to use as its config. Note: using localhost for some of the fields is applicable when using mscli on the controller-host. If you choose to also install mscli on a different machine (such as a workstation or laptop), then replace localhost with the <controller-host-ip>.

#populate yaml file

tee ~/.mscli.yaml > /dev/null << 'EOF'

api_endpoint: "http://localhost:5009"
username: "Admin123#"
password: "Admin123#"
redis_host: "localhost"
redis_port: 6379
redis_db: 0
EOF

Verify it works and can talk to the SyncEngine controller API

mscli info

MSCLI v2.0.0 (SyncEngine CLI) - commit: 9b9ace97668b455f3b4c109c5e50568a349ad675, built: 2025-08-05

Configuration:
  API Endpoint: http://localhost:5009
  Username: Admin123#
  Redis: localhost:6379 (DB: 0)

Step 6: Deploy Workers

First, on the controller, get a worker token that will be used by the workers to authenticate when they are deployed:

#generate token and populate ENV variable
export MS_TOKEN=$(curl -X POST "http://localhost:5009/auth/token/for_worker" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -u "Admin123#:Admin123#" \
  -d "token_name=my_api_token" | jq -r '.access_token')

#verify variable is popualted

echo $MS_TOKEN

Online Installation:

# Install worker on all worker hosts
clush -g workers "ms install worker ${MSURL} ~/syncengine_installation"
TODO: need to do the linger thing?
clush -g workers "sudo loginctl enable-linger vastdata" #fix for username

# Deploy workers on all worker hosts
clush -g workers "MS_TOKEN=${MS_TOKEN} WORKER_LABEL=migration-worker META_CONTROL_IP=<control-plane-ip> ms deploy worker"

Offline Installation:

# Install worker from local bundle on all worker hosts
clush -g workers "ms install worker ~/syncenginesetup/Vastdata_MetaSpace_v1.1.0.tar.gz ~/syncengine_installation"

# Deploy workers on all worker hosts
clush -g workers "MS_TOKEN=${MS_TOKEN} WORKER_LABEL=migration-worker META_CONTROL_IP=<control-plane-ip> ms deploy worker"

Verify Worker Deployment

# Check worker container status on all hosts
clush -g workers "podman ps | grep worker"

# Check worker logs on all hosts
clush -g workers "podman logs worker-migration-worker"

# Verify workers are registered with control plane
curl http://localhost:5009/workers/discovery

#this will have a lot of output, as it shows an entry for each process, one per core

VAST Configuration

Since you are migrating to VAST, there are some things that need to be configured for the best experience.  Note that all of these steps can be accomplished in the VAST VMS UI; this document focuses on using vastpy-cli.  See https://github.com/vast-data/vastpy for details on this tool, which is a CLI wrapper for the VAST REST API.

Install vastpy-cli

This can be done on the controller host, or any host that has network connectivity to https://$VMS_IP:443

Note: The same configuration steps can be performed using the VAST VMS Web UI. For detailed instructions, refer to the VAST Support Documentation.

# Install vastpy package
pip3 install vastpy

# Set up environment variables
export VMS_USER=admin
export VMS_PASSWORD=123456 #this is a default, replace with appropriate passwd
export VMS_ADDRESS=<vast-vms-address>

# Verify it can execute calls against your VAST cluster
 vastpy-cli --json get clusters|jq -r '.[0].name'
<cluster-name>

Migration Flows

This guide covers two separate migration scenarios. Choose the appropriate flow based on your source data type:

  1. POSIX Migration: Migrating from POSIX filesystems (NFS in this doc) to VAST NFS

  2. S3 Migration: Migrating from S3-compatible storage to VAST S3

Each flow has its own configuration requirements and steps. Follow the flow that matches your scenario.

POSIX Migration Configuration

Create NFS View Policy

Important: the following settings are mandatory before performing Posix->NFS migration.

It is necessary to have a VAST view policy that allows nosquash root access for worker  hosts to write data with root privileges, which is required for data migration operations.

Additionally, it is required to enable Native Protocol Limit for both the Path Length Limit as well as the Allowed Characters.  

# Get available view policies
vastpy-cli --json get viewpolicies | jq '.[] | {id: .id, name: .name, nfs_no_squash: .nfs_no_squash}'
# Create NFS view policy with nosquash hosts for worker access.  Refer to VAST Online documentation for other nosquash_hosts notation options.
vastpy-cli --json post viewpolicies \
  name='nfs-migration-policy' \
  flavor='NFS' \
  path_length='NPL' \
  allowed_characters='NPL' \
  "nfs_no_squash"='["<worker-host-1-ip>", "<worker-host-2-ip>"]'

# Verify config:

vastpy-cli --json get viewpolicies name='nfs-migration-policy' | \
jq '.[] | {
  id: .id,
  name: .name,
  path_length: .path_length,
  allowed_characters: .allowed_characters,
  nfs_no_squash: .nfs_no_squash
}'

#example output:
{
  "id": 48,
  "name": "nfs-migration-policy",
  "path_length": "NPL",
  "allowed_characters": "NPL",
  "nfs_no_squash": [
    "172.200.0.0/16",
    "10.10.10.10"
  ]
}

Replace <worker-host-1-ip>, <worker-host-2-ip>, etc., with the actual IP addresses of your worker hosts.  CIDR notation may also be used for each entry. Refer to VAST CLI Doc's for more examples.

Create NFS View
# Get the policy ID from previous step
POLICY_ID=$(vastpy-cli get viewpolicies name='nfs-migration-policy' --json | jq -r '.[0].id')
echo $POLICY_ID

# Create NFS view
vastpy-cli --json post views \
  name='nfs data migration dest' \
  path='/data-migration-dest' \
  policy_id=$POLICY_ID \
  create_dir=true \
  protocols='["NFS"]'

# Verify view creation

vastpy-cli --json get views path='/data-migration-dest'| \
jq '.[] | {
  id: .id,
  name: .name,
  path: .path,
  viewepolicy: .policy
}'

Configure QoS (Optional but Recommended)

Why QoS is Recommended: QoS policies help prevent overwhelming the VAST system during migration and ensure consistent performance for other workloads. They allow you to control bandwidth usage and adjust throttling as needed during the migration process.  Note that VAST QOS configuration options may vary based on your specific VAST Version.  This document presumes VAST 5.3; please refer to VAST Documentation for your specific version to verify.

# Create QoS policy for rate limiting writes to 10GB/sec
vastpy-cli --json post qospolicies \
  name='data-migration-qos' \
  mode='STATIC' \
  static_limits='{"max_writes_bw_mbps": 10000}' \
  limit_by='BW' \
  policy_type='VIEW'

#verify

vastpy-cli --json get qospolicies name='data-migration-qos'| \
jq '.[] | {
  id: .id,
  name: .name,
  static_limits: .static_limits
}'

QOS_ID=$(vastpy-cli get qospolicies name='data-migration-qos' --json | jq -r '.[0].id')
echo $QOS_ID

#Now, apply this QOS policy to the view previously created.

# Get view ID
VIEW_ID=$(vastpy-cli get views path='/data-migration-dest' --json | jq -r '.[0].id')
echo $VIEW_ID

# Apply QoS policy to view
vastpy-cli --json patch views/$VIEW_ID qos_policy_id=$QOS_ID| \
jq '{
  id: .id,
  name: .name,
  path: .path,
  qos_policy: .qos_policy
}'



# Adjust throttle to 1GB/sec (example)

vastpy-cli --json patch qospolicies/$QOS_ID static_limits='{"max_writes_bw_mbps": 1000}'| \
jq '{
  name: .name,
  static_limits: .static_limits
}'

S3 Migration Configuration

Create S3 User and Access Key

Note: the instructions below presume you are creating a Local user on the VAST cluster.  If you intend to use an AD/LDAP user for migration, please consult the VAST Documentation or a VAST technical person for specific steps.

# Create S3 user
vastpy-cli --json post users name='s3-migration' user_type='LOCAL'| \
  jq '{
  name: .name,
  id: .id
}'

# Get user ID
USER_ID=$(vastpy-cli get users name='s3-migration' --json | jq -r '.[0].id')
echo $USER_ID

# generate S3 access key
vastpy-cli --json post users/$USER_ID/access_keys \
  description='S3 migration'

# Save the access key and secret (shown only once)
{
  "access_key": "9NTPT0P7QBDQTXNUBQ8Z",
  "secret_key": "NDQbgCzLTk9fruK0UezcbM1M4wovlhrstJFCpcYU"
}
Create Identity Policy
# Create identity policy granting full access to destination bucket
#note: due to requirement to send a JSON 'string', this is a two step process
#1.  Create a json file with the policy, note that the resulting file will look different than what is shown below:

##be sure to adjust the "Resource" to match the bucket name you will be using as a migration destination.
##Also note the tenant_id.  This is for the 'default' tenant. If you have a multi-tenant system, please adjust accordingly.
jq -n '{
    "name": "s3-migration-policy",
    "tenant_id": 1,
    "enabled": true,
    "policy": ({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Stmt1",
                "Action": ["s3:*"],
                "Effect": "Allow",
                "Resource": [
                    "s3-data-dest",
                    "s3-data-dest/*"
                ]
            }
        ]
    } | tostring)
}' > /tmp/s3policy_create.json

#2. execute vastpy-cli , using the json file as an input.
vastpy-cli --json post s3policies -i /tmp/s3policy_create.json

#get the id of the policy
S3_POLICY_ID=$(vastpy-cli --json get s3policies name='s3-migration-policy' | jq -r '.[0].id')
echo $S3_POLICY_ID

# Assign policy to user.
#***Note***: Notice that we are assignign the Identity policy to the 'tenant_data' for the user, specifying the previously used tenant_id 

vastpy-cli patch users/$USER_ID/tenant_data tenant_id=1 s3_policies_ids="[$S3_POLICY_ID]"
Create View Policy
# Create S3 view policy
vastpy-cli --json post viewpolicies \
  name='s3-migration-policy' \
  flavor='S3_NATIVE' \
  path_length='NPL' \
  allowed_characters='NPL'

# Verify policy creation
vastpy-cli --json get viewpolicies name='s3-migration-policy'| \
jq '.[] | {
  id: .id,
  name: .name,
  flavor: .flavor,
  path_length: .path_length,
  allowed_characters: .allowed_characters,
}'
Create S3 View
# Get the S3 policy ID
S3_VIEWPOLICY_ID=$(vastpy-cli get viewpolicies name='s3-migration-policy' --json | jq -r '.[0].id')

# Create S3 view (aka: Bucket).  Note the bucket name must match what was specified in the identity policy.
#the bucket_owner should match the name of the user which was created.
vastpy-cli --json post views \
  path='/s3-data-dest' \
  policy_id=$S3_VIEWPOLICY_ID \
  create_dir=true \
  protocols='["S3"]' \
  bucket='s3-data-dest' \
  bucket_owner='s3-migration'

# Verify view creation
vastpy-cli --json get views path='/s3-data-dest' | \
jq '.[] | {
  id: .id,
  name: .name,
  bucket: .bucket,
  bucket_owner: .bucket_owner
}'

Data Migration Setup

Step 1: Mount Filesystems on Worker Hosts

This is executed from the Control node

Mount All Hosts Using ClusterShell

Source NFS Fileserver

# Mount source on all worker hosts
clush -g workers "sudo mkdir -p /mnt/source && sudo mount -t nfs -o vers=3,proto=tcp,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 <source-nfs-server>:/migration-source /mnt/source"

VAST NFS Server

With vast-nfs client (Recommended):

# Mount VAST destination on all worker hosts with vast-nfs load balancing
clush -g workers "sudo mkdir -p /mnt/vast-dest && sudo mount -t nfs -o vers=3,proto=tcp,nconnect=8,remoteports=<vast-vip-range>,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 <vast-dns-name>:/data-migration-dest /mnt/vast-dest"

Without vast-nfs client (Standard NFS):

# Mount VAST destination on all worker hosts with standard NFS
clush -g workers "sudo mkdir -p /mnt/vast-dest && sudo mount -t nfs -o vers=3,proto=tcp,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 <vast-dns-name>:/data-migration-dest /mnt/vast-dest"

Load Balancing: The nconnect=8 and remoteports=<vast-vip-range> options enable load balancing across multiple VAST VIPs. Replace <vast-vip-range> with your VAST VIP pool range (e.g., 172.25.1.1-172.25.1.32). For more details on VAST NFS mount options, refer to the VAST NFS Documentation.

# Verify mounts on all hosts
clush -g workers "df -h /mnt/source /mnt/vast-dest"

Step 2: Configure Network Connectivity for S3

Test S3 Connectivity (HTTP)

# Test HTTP connectivity to VAST S3
curl -v http://<vast-dns-name>:80

# Test basic S3 endpoint response
curl -v http://<vast-dns-name>:80/data-dest/

Test S3 Connectivity (HTTPS)

# Test HTTPS connectivity to VAST S3
curl -v https://<vast-dns-name>:443


**Load Balancing**: VAST S3 automatically load balances across multiple VIPs. Ensure your DNS resolution or load balancer configuration distributes requests across your VAST VIP pool for optimal performance.

### Step 3: Create Connectors

#### Create POSIX Connector

```bash
# Use mscli to create POSIX connectors

##source
mscli connector create filesystem \
  source-posix

##destination
mscli connector create filesystem \
  vast-posix

Create S3 Connector

# Create source S3 connector
mscli connector create s3 \
  source-s3 \
  <source-access-key> \
  <source-secret-key> \
  us-east-1 \
  http://<source-s3-endpoint>:80

# Create VAST S3 connector
mscli connector create s3 \
  vast-s3 \
  <vast-access-key-id> \
  <vast-secret-access-key> \
  us-east-1 \
  http://<vast-dns-name>:80

Step 4: Create Migration

Create POSIX Migration

# Create POSIX to VAST migration
mscli migration create \
  source-posix \
  vast-posix \
  /mnt-source \
  /vast-dest \
  posix-migration \
  ```

### S3 Migration Flow

Follow this flow if you are migrating from S3-compatible storage to VAST S3.

#### Create S3 Migration

```bash
# Create S3 to VAST migration
./mscli migration create \
  --source source-s3 \
  --destination vast-s3 \
  --source-path / \
  --destination-path / \
  --label migration-worker \
  --batch-size 1 \
  --batch-file-count 1000

Running Migrations

Start Migration

# Start POSIX migration
mscli migration start --label migration-worker

# Start S3 migration (if following S3 migration flow)
mscli migration start --label migration-worker

Monitor Migration Progress

# Monitor migration status
mscli migration status --label posix-migration

# Watch migration progress in real-time
mscli migration status --label posix-migration --watch

# Get detailed migration information
mscli migration info --label posix-migration

Pause and Resume Migration

# Pause migration
mscli migration pause --label posix-migration

# Resume migration
mscli migration resume --label posix-migration

Cancel Migration

# Cancel running migration
mscli migration cancel --label posix-migration

# Verify cancellation
mscli migration status --label posix-migration

Monitoring

Web-Based Monitoring

Access Grafana Dashboard

# Open Grafana in web browser
http://<control-plane-ip>:3009

# Default credentials:
# Username: admin
# Password: admin

Available Dashboards: - Data Migration Dashboard: Migration progress, file processing statistics, error rates - Worker Metrics: Worker performance, task processing rates, resource utilization - System Overview: Overall system health and performance

Access Prometheus

# Open Prometheus in web browser
http://<control-plane-ip>:9991

# View targets, metrics, and alerts

Command-Line Monitoring

Check Migration Status

# List all migrations
mscli migration list

# Get detailed migration status
mscli migration status <migration-label>

# View migration logs
mscli migration logs <migration-label>

Check Worker Status

# List all workers
mscli worker list

# Get worker metrics
mscli worker metrics <migration-label>

# Check worker health
./mscli worker health <migration-label>

VAST UI Monitoring

Dataflow UI

  1. Access VAST Web UI: https://<vast-vms-address>

  2. Navigate to Data EngineDataflow

  3. Filter by view/bucket name to see migration activity

  4. Monitor data transfer rates and performance

Capacity Estimations UI

  1. Navigate to AnalyticsCapacity Estimations

  2. Select the destination path (/data-dest)

  3. View current data size and data reduction ratios

  4. Monitor storage efficiency improvements

Troubleshooting

Common Issues

Network Connectivity Issues

# Test connectivity between hosts
ping <target-host>

# Test specific ports
telnet <target-host> <port>


# Check firewall settings
sudo firewall-cmd --list-all

Container Issues

# Check container status
podman ps -a

# View container logs
podman logs <container-name>

# Restart containers
./ms down
./ms deploy control

Mount Issues

Perform these checks on one or more of the workers.

# Check mount status
mount | grep nfs

# Check NFS server connectivity
rpcinfo -p <source-nfs-server>

# Test NFS mount manually
sudo mount -t nfs -o vers=3 <source-nfs-server>:/path /mnt/source

S3 Connectivity Issues

Perform these checks on one or more of the workers.

# Test S3 endpoint
curl -v http://<source-s3-endpoint>

# Test with AWS CLI
aws s3 ls s3://bucket-name --endpoint-url http://<source-s3-endpoint>

# Check access key permissions
aws s3 ls s3://bucket-name --endpoint-url http://<source-s3-endpoint> --profile <profile>

Getting Help

  • API Documentation: http://<control-plane-ip>:5009/docs

  • Container Logs: podman logs for detailed error information

  • VAST Documentation: Refer to VAST support documentation for platform-specific issues

Important Notes

Limitations

  • S3 Tags: S3 object tags are not preserved in this version

  • S3 ACLs: S3 access control lists are not preserved in this version; it is recommended to use Identity Policies for access control.

  • POSIX ACLs: POSIX ACLs (from GPFS or Lustre) may require additional configuration for preservation; contact VAST for assistance.

  • Hardlinks: Currently, migration of hardlinks will result in multiple duplicate copies being hydrated on the destination.  The storage impact will be negligible due to VAST's Data Reduction, it will increase network I/O and lose the relationship between hardlinks.

Best Practices

  1. Test with Small Dataset: Always test migrations with a small dataset first

  2. Monitor Resources: Keep an eye on worker host CPU, memory, and network usage during migrations

  3. Use QoS: Configure QoS policies to quickly and easily throttle migrations and prevent source and destination overload.

  4. Regular Monitoring: Use the monitoring tools to track migration progress and identify issues early

  5. Document Configuration: Save your steps for future migrations and deployments.

Performance Optimization

  • Batch Size: Adjust batch size based on your network and storage performance

  • Worker Count: Scale workers based on available resources and migration requirements

  • QoS Policies: Use VAST QoS policies to control bandwidth usage

Conclusion

SyncEngine provides a powerful and flexible solution for migrating data to the VAST Data platform. By following this guide, you can successfully deploy, configure, and run data migrations while maintaining full visibility into the process through comprehensive monitoring tools.

For additional support and advanced configuration options, refer to the VAST support documentation and SyncEngine API reference.