Intro
This guide walks through starting the SkyPilot API server, verifying Kubernetes compute credentials, and configuring VAST Data as a storage backend.
Prerequisites
SkyPilot installed
A running Kubernetes cluster with a valid kubeconfig
VastData S3 endpoint URL and access credentials
Python 3 with pip
Step 1: Start the SkyPilot API Server
Start the SkyPilot API server, binding it to all network interfaces so it is accessible remotely:
sky api start --host 0.0.0.0Expected output:
✓ SkyPilot API server started.
├── SkyPilot API server and dashboard: http://0.0.0.0:46580
└── View API server logs at: ~/.sky/api_server/server.logThe API server and its web dashboard are now available on port 46580. If running from source, you can rebuild the dashboard with:
npm --prefix sky/dashboard install && npm --prefix sky/dashboard run buildStep 2: Verify Kubernetes Credentials
Confirm that SkyPilot can use your Kubernetes cluster for compute:
sky check kubernetesExpected output:
Checking credentials to enable infra for SkyPilot.
Kubernetes: enabled [compute]
Allowed contexts:
└── kubernetes-admin@kubernetes: enabled.Step 3: Check VastData Credentials (Initial State)
Run the VastData credential check to see what configuration is needed:
sky check vastdataIf VastData is not yet configured, the output will show it as disabled and provide setup instructions:
Checking credentials to enable infra for SkyPilot.
VastData: disabled
Reason [storage]: [vastdata] profile is not set in ~/.vastdata/vastdata.credentials. Additionally, [vastdata] profile is not set in ~/.vastdata/vastdata.config. Run the following commands:
$ pip install skypilot[vastdata]
$ AWS_SHARED_CREDENTIALS_FILE=~/.vastdata/vastdata.credentials aws configure --profile vastdata
$ AWS_CONFIG_FILE=~/.vastdata/vastdata.config aws configure set endpoint_url <ENDPOINT_URL> --profile vastdataFollow the steps below to configure it.
Step 4: Install vastdata module
VastData storage integration requires boto3 (the AWS SDK for Python), which is used to communicate with the S3-compatible endpoint:
pip install boto3Step 5: Configure VastData Access Credentials
Use the AWS CLI to write your VastData S3 access key and secret key into a dedicated credentials file (~/.vastdata/vastdata.credentials):
AWS_SHARED_CREDENTIALS_FILE=~/.vastdata/vastdata.credentials \
aws configure --profile vastdataYou will be prompted for:
Prompt | Value |
|---|---|
AWS Access Key ID | Your VastData access key |
AWS Secret Access Key | Your VastData secret key |
Default region name | (leave blank — press Enter) |
Default output format | (leave blank — press Enter) |
This creates the file ~/.vastdata/vastdata.credentials with a [vastdata] profile.
Step 6: Configure the VastData S3 Endpoint URL
Set the VastData S3-compatible endpoint URL in the config file (~/.vastdata/vastdata.config):
AWS_CONFIG_FILE=~/.vastdata/vastdata.config \
aws configure set endpoint_url <ENDPOINT_URL> --profile vastdataReplace <ENDPOINT_URL> with your VastData S3 endpoint (e.g., http://172.27.115.1).
Step 7: Verify VastData Is Enabled
Re-run the credential check to confirm VastData storage is now configured:
sky check vastdataExpected output:
Checking credentials to enable infra for SkyPilot.
VastData: enabled [storage]
🎉 Enabled infra 🎉
VastData [storage]VastData is now registered as a storage backend in SkyPilot. You can now use it for file mounts and managed storage in your task YAML files.
Test 1: FUSE Mount Mode (MOUNT)
Step 8: Create a Task YAML with a VastData Mount
Create a YAML file (e.g., test_vast_mount.yaml) that mounts a VastData bucket and runs a command against it:
file_mounts:
/data:
source: vastdata://skypilot
mode: MOUNT
resources:
cloud: Kubernetes
cpus: 2
run: |
ls /datasource: vastdata://skypilot— Mounts the VastData bucket named skypilot using the FUSE-based mounter.mode: MOUNT— The bucket is mounted as a live filesystem (as opposed to COPY, which downloads files at setup time).resources.cloud: Kubernetes— The task will run on the Kubernetes cluster verified in Step 2.
Step 9: Launch the Task
Launch the task with sky launch:
sky launch test_vast_mount.yamlSkyPilot will display the chosen resources and ask for confirmation:
Considered resources (1 node):
--------------------------------------------------------------------------------------------------
INFRA INSTANCE vCPUs Mem(GB) GPUS COST ($) CHOSEN
--------------------------------------------------------------------------------------------------
Kubernetes (kubernetes-...@kubernetes) - 2 2 - 0.00 ✔
--------------------------------------------------------------------------------------------------
Launching a new cluster 'sky-19aa-vastdata'. Proceed? [Y/n]:Type y to proceed. SkyPilot will:
Provision a Kubernetes pod with the requested resources.
Mount the VastData bucket at
/datausing a FUSE filesystem.Run the
ls /datacommand inside the pod.
Expected output:
✓ Cluster launched: sky-19aa-vastdata.
⚙︎ Syncing files.
Mounting (to 1 node): vastdata://skypilot -> /data
✓ Storage mounted.
⚙︎ Job submitted, ID: 1
(task, pid=1862) aaa
(task, pid=1862) boto3-test.txt
(task, pid=1862) created_by_goofys
(task, pid=1862) created_by_rclone
(task, pid=1862) hosts
✓ Job finished (status: SUCCEEDED).The ls /data command lists the contents of the VastData bucket, confirming the mount is working.
Step 10: SSH into the Cluster and Inspect the Mount
You can SSH into the running cluster to interactively explore the mounted storage:
ssh sky-19aa-vastdataOnce connected, verify the mount:
# List files in the mounted bucket
ls -la /data/
# Check the FUSE mount entry
mount | grep /dataExpected output:
skypilot on /data type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)This confirms the VastData bucket is mounted as a read-write FUSE filesystem. You can also check disk space reported by the mount:
df | grep /dataskypilot 1099511627776 0 1099511627776 0% /dataTest 2: Cached Mount Mode (MOUNT_CACHED)
The MOUNT_CACHED mode uses rclone with VFS caching. Files are cached locally on disk, giving faster repeated reads and full read-write support with write-back to the remote bucket. This is ideal for workloads that read the same files multiple times or need to write results back to the bucket.
Step 12: Create a Cached Mount Task YAML
Create a YAML file (e.g., test_vast_cache.yaml) that uses MOUNT_CACHED instead of MOUNT:
file_mounts:
/data:
source: vastdata://skypilot
mode: MOUNT_CACHED
resources:
cloud: Kubernetes
cpus: 2
run: |
ls /datamode:
MOUNT_CACHED— The bucket is mounted via rclone with a local VFS cache. Reads are cached on the node's local disk (up to 10 GB by default), and writes are buffered and flushed back to the remote bucket.
Step 13: Launch the Cached Mount Task
sky launch test_vast_cache.yamlSkyPilot will display the chosen resources and ask for confirmation:
Considered resources (1 node):
--------------------------------------------------------------------------------------------------
INFRA INSTANCE vCPUs Mem(GB) GPUS COST ($) CHOSEN
--------------------------------------------------------------------------------------------------
Kubernetes (kubernetes-...@kubernetes) - 2 2 - 0.00 ✔
--------------------------------------------------------------------------------------------------
Launching a new cluster 'sky-7cb4-vastdata'. Proceed? [Y/n]:Type y to proceed. Notice the log line says "Mounting cached mode" instead of just "Mounting":
⚙︎ Syncing files.
Mounting cached mode (to 1 node): vastdata://skypilot -> /data
✓ Storage mounted.After the job runs, you will also see a cache upload confirmation:
(task, pid=1849) aaa
(task, pid=1849) boto3-test.txt
(task, pid=1849) created_by_goofys
(task, pid=1849) created_by_rclone
(task, pid=1849) hosts
(task, pid=1849) skypilot: cached mount upload complete (took 11s)
✓ Job finished (status: SUCCEEDED).The line skypilot: cached mount upload complete confirms that any locally-cached writes were flushed back to the VastData bucket.
Step 14: SSH and Inspect the Cached Mount
SSH into the cluster:
ssh sky-7cb4-vastdataVerify the files are visible:
ls /data/aaa boto3-test.txt created_by_goofys created_by_rclone hostsInspect the rclone VFS process to see the caching parameters:
ps -aef | grep vfsExpected output:
sky 1726 1 0 09:52 ? 00:00:00 /usr/bin/rclone mount sky-vastdata-skypilot:skypilot /data \
--daemon --daemon-wait 10 \
--log-file /home/sky/.sky/rclone_log/4caa791091d21d23e63637080226f370.log \
--log-level INFO --allow-other \
--vfs-cache-mode full \
--dir-cache-time 10s \
--vfs-cache-poll-interval 10s \
--cache-dir /home/sky/.cache/rclone/4caa791091d21d23e63637080226f370 \
--vfs-fast-fingerprint \
--vfs-cache-max-size 10G \
--vfs-write-back 1sKey rclone VFS flags to note:
Flag | Description |
|---|---|
| Full read-write caching — all reads and writes go through the local cache |
| Maximum local cache size before eviction |
| Writes are flushed to the remote bucket after 1 second of inactivity |
| Directory listings are cached for 10 seconds |
| Uses size and modification time (not checksums) for faster cache validation |
Test 3: Copy Mode (COPY)
The COPY mode downloads the entire bucket contents to the local filesystem at launch time. The files are plain local files — no FUSE mount, no background process. This is the simplest and most compatible mode, ideal when your workload needs fast local I/O, and the dataset fits on disk.
Step 16: Create a Copy Mode Task YAML
Create a YAML file (e.g., test_vast_copy.yaml) that uses COPY mode:
file_mounts:
/data:
source: vastdata://skypilot
mode: COPY
resources:
cloud: Kubernetes
cpus: 2
run: |
ls /datamode: COPY— The bucket contents are downloaded to /data during the file sync phase. There is no live connection to the remote bucket after the copy completes.
Step 17: Launch the Copy Mode Task
sky launch test_vast_copy.yamlType y to confirm. Notice the log line says "Syncing" rather than "Mounting":
⚙︎ Syncing files.
Syncing (to 1 node): vastdata://skypilot -> /data
✓ Synced file_mounts.Expected job output:
(task, pid=1852) aaa
(task, pid=1852) boto3-test.txt
(task, pid=1852) created_by_goofys
(task, pid=1852) created_by_rclone
(task, pid=1852) hosts
✓ Job finished (status: SUCCEEDED).Step 18: SSH and Inspect the Copy
SSH into the cluster:
ssh sky-f053-vastdataVerify the files are present:
ls /data/aaa boto3-test.txt created_by_goofys created_by_rclone hostsConfirm there is no FUSE mount — the files are regular local files on the pod's filesystem:
df | grep /dataThis returns no output, confirming /data is not a separate mount point. The files were copied directly into the pod's local filesystem during setup.
Step 19: Clean Up the COPY Cluster
sky down sky-f053-vastdataTest 4: Auto-Create a New Bucket
SkyPilot can automatically create a new VastData bucket and mount it. This is useful when your task needs a fresh, empty storage location — for example, to write output data or checkpoints.
Step 20: Create a Bucket-Creation Task YAML
Create a YAML file (e.g., test_vast_create_bucket.yaml) that instructs SkyPilot to create a new bucket and mount it:
file_mounts:
/data:
name: skypilotnew
source: ~
store: vastdata
resources:
cloud: Kubernetes
cpus: 2
run: |
ls /dataname: skypilotnew— The name of the new bucket to create on VastData.source: ~— Indicates that the local home directory contents should be synced (use ~ as a minimal placeholder; SkyPilot will create the bucket even if nothing is uploaded).store: vastdata— Tells SkyPilot to create the bucket on VastData (rather than AWS S3, GCS, etc.).
Step 21: Launch the Task
sky launch test_vast_create_bucket.yamlType y to confirm. SkyPilot will create the bucket before launching the cluster:
Created S3 bucket 'skypilotnew' in auto
⚙︎ Launching on Kubernetes.
└── Pod is up.
✓ Cluster launched: sky-405f-vastdata.
⚙︎ Syncing files.
Mounting (to 1 node): skypilotnew -> /data
✓ Storage mounted.
✓ Job finished (status: SUCCEEDED).The bucket is created on the VastData S3 endpoint and then mounted into the pod at /data.
Step 22: Verify with sky storage ls
List all SkyPilot-managed storage to confirm the new bucket exists:
sky storage lsExpected output:
NAME UPDATED STORE COMMAND STATUS
skypilotnew 52 secs ago VASTDATA sky launch test_vast_crea... READYThe bucket is tracked by SkyPilot and can be reused, mounted by other tasks, or deleted with sky storage delete skypilotnew.
Step 23: SSH and Inspect the New Bucket Mount
SSH into the cluster:
ssh sky-405f-vastdataVerify the bucket is mounted as a FUSE filesystem:
df | grep /dataExpected output:
skypilotnew 1099511627776 0 1099511627776 0% /dataThe newly created bucket is mounted and ready for use. Your task's run commands can write data to /data, and it will be stored in the VastData bucket skypilotnew.
Step 24: Clean Up
Tear down the cluster:
sky down sky-405f-vastdataOptionally, delete the bucket if it is no longer needed:
sky storage delete skypilotnewTest 5: Kubernetes Persistent Volumes with VAST Data
SkyPilot can provision Kubernetes Persistent Volume Claims (PVCs) backed by VAST Data and mount them alongside S3 buckets. This is useful when your workload needs high-performance, block-level storage — for example, model checkpoints, scratch space, or databases — while also accessing object storage for datasets.
Step 25: Create a Volume Definition YAML
Create a YAML file (e.g., vol.yaml) that defines a Kubernetes PVC backed by VAST Data:
name: new-pvc
type: k8s-pvc
infra: k8s
size: 100Giname: new-pvc— A human-readable name for the volume in SkyPilot.type: k8s-pvc— Tells SkyPilot to create a Kubernetes Persistent Volume Claim.infra: k8s— The volume will be provisioned on the Kubernetes cluster.size: 100Gi— The requested storage capacity. The PVC will be dynamically provisioned using the cluster's default StorageClass (e.g.,vastdata-filesystem).
Custom StorageClass and Access Mode
By default, SkyPilot uses the cluster's default StorageClass and ReadWriteOnce (RWO) access mode. You can override these with the config section to use a specific StorageClass or a different access mode. For example, to create a volume using a custom StorageClass with ReadWriteMany access:
name: new-pvc
type: k8s-pvc
infra: k8s
size: 1200Gi
config:
storage_class_name: fsx-sc
access_mode: ReadWriteManyconfig.storage_class_name— Specifies a non-default Kubernetes StorageClass (e.g.,fsx-scfor FSx, or a different VAST Data StorageClass). The StorageClass must already exist in the cluster.config.access_mode— Sets the PVC access mode. Common values:
Access Mode | Description |
|---|---|
| Volume can be mounted as read-write by a single node (default) |
ReadWriteMany | Volume can be mounted as read-write by multiple nodes simultaneously |
ReadOnlyMany | Volume can be mounted as read-only by multiple nodes simultaneously |
Use ReadWriteMany when multiple pods or nodes need concurrent write access to the same volume — for example, distributed training jobs or shared scratch space.
Step 26: Create the Volume
Apply the volume definition with sky volumes apply:
sky volumes apply vol.yamlExpected output:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
new-pvc-d5427ffd-5b9290 Bound pvc-8769b777-dcea-483d-a7da-3a61e90656ed 100Gi RWO vastdata-filesystem 5sKey fields to verify:
Field | Expected Value | Description |
|---|---|---|
STATUS | Bound | PVC is bound to a provisioned volume |
CAPACITY | 100Gi | Matches the requested size |
ACCESS MODES | RWO | ReadWriteOnce — mountable by a single node |
STORAGECLASS | vastdata-filesystem | Backed by the VAST Data CSI driver |
Step 28: List Volumes in SkyPilot
Verify the volume is tracked by SkyPilot:
sky volumes lsExpected output:
Kubernetes PVCs:
NAME TYPE INFRA SIZE USER WORKSPACE AGE STATUS LAST_USE USED_BY IS_EPHEMERAL
new-pvc k8s-pvc Kubernetes/kubernetes-admin@kubernetes 100Gi vastdata default 15s READY - - FalseThe volume status is READY, meaning it can be mounted by tasks.
Step 29: Create a Task YAML with Both a Volume and an S3 Mount
Volumes can be used alongside VastData S3 mounts. Create a YAML file (e.g., sky.yaml) that mounts both:
file_mounts:
/data:
source: vastdata://skypilot
mode: MOUNT
volumes:
/pvc: new-pvc
resources:
cloud: Kubernetes
cpus: 2
run: |
ls /data
ls /pvcfile_mounts— Mounts the VastData S3 bucket skypilot at /data using FUSE (as in Test 1).volumes— Mounts the PVCnew-pvcat/pvcinside the pod. This is a block-level mount backed by the VAST Data CSI driver.Both mounts are available simultaneously, allowing your workload to read datasets from S3 and write outputs to the persistent volume.
Step 30: Launch the Task
sky launch sky.yamlExpected output:
Considered resources (1 node):
--------------------------------------------------------------------------------------------------
INFRA INSTANCE vCPUs Mem(GB) GPUS COST ($) CHOSEN
--------------------------------------------------------------------------------------------------
Kubernetes (kubernetes-...@kubernetes) - 2 2 - 0.00 ✔
--------------------------------------------------------------------------------------------------
Launching a new cluster 'sky-c84e-vastdata'. Proceed? [Y/n]: y
⚙︎ Launching on Kubernetes.
└── Pod is up.
✓ Cluster launched: sky-c84e-vastdata.
⚙︎ Syncing files.
Mounting (to 1 node): vastdata://skypilot -> /data
✓ Storage mounted.
⚙︎ Job submitted, ID: 1
└── Job started. Streaming logs...
✓ Job finished (status: SUCCEEDED).The pod has both /data (S3 FUSE mount) and /pvc (PVC block mount) available.
Step 31: SSH and Inspect Both Mounts
SSH into the cluster:
ssh sky-c84e-vastdataVerify the S3 mount:
ls /data/Verify the PVC mount:
ls /pvc/
df -h /pvcExpected output:
Filesystem Size Used Avail Use% Mounted on
/dev/vast0 100G 0 100G 0% /pvcThe PVC is a standard block filesystem mount — no FUSE process involved. It provides native filesystem performance and full POSIX compatibility.
Step 32: Clean Up
Tear down the cluster:
sky down sky-c84e-vastdataThe PVC persists independently of the cluster. To delete it:
sky volumes delete new-pvcMount Mode Comparison
Aspect |
|
|
|
|---|---|---|---|
Backend | FUSE (goofys-based) | rclone with VFS cache | rclone sync (one-time download) |
Data transfer | On-demand per read | On-demand + local cache | Full download at launch |
Read performance | Network-bound (every read) | Fast for repeated reads (cached) | Native local disk speed |
Write support | Limited | Full read-write with write-back | Local only (not synced back) |
Local disk usage | Minimal | Up to 10 GB cache (configurable) | Full dataset size |
Live remote connection | Yes (FUSE process) | Yes (rclone process) | No |
Best for | Streaming large files, read-once workloads | Iterative reads, training data, read-write workloads | Small datasets, max I/O performance, offline access |
Summary
Step | Action | Result |
|---|---|---|
1 |
| API server + dashboard running on port 46580 |
2 |
| Kubernetes enabled for compute |
3 |
| Shows what VastData config is missing |
4 |
| S3 client library installed |
5 |
| Access credentials stored |
6 |
| S3 endpoint configured |
7 |
| VastData enabled for storage |
8 |
| Task YAML with VastData FUSE mount |
9 |
| Cluster launched, bucket mounted, job succeeded |
10 |
| Interactive access to inspect the FUSE mount |
11 |
| MOUNT cluster torn down |
12 |
| Task YAML with VastData cached mount |
13 |
| Cluster launched, cached mount active, job succeeded |
14 |
| Interactive access to inspect rclone VFS cache |
15 |
| MOUNT_CACHED cluster torn down |
16 |
| Task YAML with VastData copy mode |
17 |
| Cluster launched, files synced, job succeeded |
18 |
| Interactive access — no FUSE mount, plain local files |
19 |
| COPY cluster torn down |
20 |
| Task YAML that auto-creates a new VastData bucket |
21 |
| Bucket created, cluster launched, mounted |
22 |
| New bucket visible in SkyPilot storage list |
23 |
| Bucket mounted as FUSE filesystem at /data |
24 |
| Cluster torn down, optionally delete bucket |