Setting Up SkyPilot with Kubernetes and VAST Data

Prev Next

Intro

This guide walks through starting the SkyPilot API server, verifying Kubernetes compute credentials, and configuring VAST Data as a storage backend.

Prerequisites

  • SkyPilot installed

  • A running Kubernetes cluster with a valid kubeconfig

  • VastData S3 endpoint URL and access credentials

  • Python 3 with pip

Step 1: Start the SkyPilot API Server

Start the SkyPilot API server, binding it to all network interfaces so it is accessible remotely:

sky api start --host 0.0.0.0

Expected output:

✓ SkyPilot API server started.
├── SkyPilot API server and dashboard: http://0.0.0.0:46580
└── View API server logs at: ~/.sky/api_server/server.log

The API server and its web dashboard are now available on port 46580. If running from source, you can rebuild the dashboard with:

npm --prefix sky/dashboard install && npm --prefix sky/dashboard run build

Step 2: Verify Kubernetes Credentials

Confirm that SkyPilot can use your Kubernetes cluster for compute:

sky check kubernetes

Expected output:

Checking credentials to enable infra for SkyPilot.
  Kubernetes: enabled [compute]
    Allowed contexts:
    â””── kubernetes-admin@kubernetes: enabled.

Step 3: Check VastData Credentials (Initial State)

Run the VastData credential check to see what configuration is needed:

sky check vastdata

If VastData is not yet configured, the output will show it as disabled and provide setup instructions:

Checking credentials to enable infra for SkyPilot.
  VastData: disabled
    Reason [storage]: [vastdata] profile is not set in ~/.vastdata/vastdata.credentials. Additionally, [vastdata] profile is not set in ~/.vastdata/vastdata.config. Run the following commands:
      $ pip install boto3
      $ AWS_SHARED_CREDENTIALS_FILE=~/.vastdata/vastdata.credentials aws configure --profile vastdata
      $ AWS_CONFIG_FILE=~/.vastdata/vastdata.config aws configure set endpoint_url <ENDPOINT_URL> --profile vastdata

Follow the steps below to configure it.

Step 4: Install boto3

VastData storage integration requires boto3 (the AWS SDK for Python), which is used to communicate with the S3-compatible endpoint:

pip install boto3

Step 5: Configure VastData Access Credentials

Use the AWS CLI to write your VastData S3 access key and secret key into a dedicated credentials file (~/.vastdata/vastdata.credentials):

AWS_SHARED_CREDENTIALS_FILE=~/.vastdata/vastdata.credentials \
  aws configure --profile vastdata

You will be prompted for:

Prompt

Value

AWS Access Key ID

Your VastData access key

AWS Secret Access Key

Your VastData secret key

Default region name

(leave blank — press Enter)

Default output format

(leave blank — press Enter)

This creates the file ~/.vastdata/vastdata.credentials with a [vastdata] profile.

Step 6: Configure the VastData S3 Endpoint URL

Set the VastData S3-compatible endpoint URL in the config file (~/.vastdata/vastdata.config):

AWS_CONFIG_FILE=~/.vastdata/vastdata.config \
  aws configure set endpoint_url <ENDPOINT_URL> --profile vastdata

Replace <ENDPOINT_URL> with your VastData S3 endpoint (e.g., http://172.27.115.1).

Step 7: Verify VastData Is Enabled

Re-run the credential check to confirm VastData storage is now configured:

sky check vastdata

Expected output:

Checking credentials to enable infra for SkyPilot.
  VastData: enabled [storage]

🎉 Enabled infra 🎉
  VastData [storage]

VastData is now registered as a storage backend in SkyPilot. You can now use it for file mounts and managed storage in your task YAML files.

Step 8: Create a Task YAML with a VastData Mount

Create a YAML file (e.g., test_vast_mount.yaml) that mounts a VastData bucket and runs a command against it:

file_mounts:
  /data:
    source: vastdata://skypilot
    mode: MOUNT

resources:
  cloud: Kubernetes
  cpus: 2

run: |
  ls /data
  • source: vastdata://skypilot — Mounts the VastData bucket named skypilot using the FUSE-based mounter.

  • mode: MOUNT — The bucket is mounted as a live filesystem (as opposed to COPY, which downloads files at setup time).

  • resources.cloud: Kubernetes — The task will run on the Kubernetes cluster verified in Step 2.

Step 9: Launch the Task

Launch the task with sky launch:

sky launch test_vast_mount.yaml

SkyPilot will display the chosen resources and ask for confirmation:

Considered resources (1 node):
--------------------------------------------------------------------------------------------------
 INFRA                                    INSTANCE   vCPUs   Mem(GB)   GPUS   COST ($)   CHOSEN
--------------------------------------------------------------------------------------------------
 Kubernetes (kubernetes-...@kubernetes)   -          2       2         -      0.00          âœ”
--------------------------------------------------------------------------------------------------
Launching a new cluster 'sky-19aa-vastdata'. Proceed? [Y/n]:

Type y to proceed. SkyPilot will:

  1. Provision a Kubernetes pod with the requested resources.

  2. Mount the VastData bucket at /data using a FUSE filesystem.

  3. Run the ls /data command inside the pod.

Expected output:

✓ Cluster launched: sky-19aa-vastdata.
⚙︎ Syncing files.
  Mounting (to 1 node): vastdata://skypilot -> /data
✓ Storage mounted.
⚙︎ Job submitted, ID: 1
(task, pid=1862) aaa
(task, pid=1862) boto3-test.txt
(task, pid=1862) created_by_goofys
(task, pid=1862) created_by_rclone
(task, pid=1862) hosts
✓ Job finished (status: SUCCEEDED).

The ls /data command lists the contents of the VastData bucket, confirming the mount is working.

Step 10: SSH into the Cluster and Inspect the Mount

You can SSH into the running cluster to interactively explore the mounted storage:

ssh sky-19aa-vastdata

Once connected, verify the mount:

# List files in the mounted bucket
ls -la /data/

# Check the FUSE mount entry
mount | grep /data

Expected output:

skypilot on /data type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)

This confirms the VastData bucket is mounted as a read-write FUSE filesystem. You can also check disk space reported by the mount:

df | grep /data
skypilot    1099511627776    0    1099511627776   0% /data

Test 2: Cached Mount Mode (MOUNT_CACHED)

The MOUNT_CACHED mode uses rclone with VFS caching. Files are cached locally on disk, giving faster repeated reads and full read-write support with write-back to the remote bucket. This is ideal for workloads that read the same files multiple times or need to write results back to the bucket.

Step 12: Create a Cached Mount Task YAML

Create a YAML file (e.g., test_vast_cache.yaml) that uses MOUNT_CACHED instead of MOUNT:

file_mounts:
  /data:
    source: vastdata://skypilot
    mode: MOUNT_CACHED

resources:
  cloud: Kubernetes
  cpus: 2

run: |
  ls /data
  • mode: MOUNT_CACHED — The bucket is mounted via rclone with a local VFS cache. Reads are cached on the node's local disk (up to 10 GB by default), and writes are buffered and flushed back to the remote bucket.

Step 13: Launch the Cached Mount Task

sky launch test_vast_cache.yaml

SkyPilot will display the chosen resources and ask for confirmation:

Considered resources (1 node):
--------------------------------------------------------------------------------------------------
 INFRA                                    INSTANCE   vCPUs   Mem(GB)   GPUS   COST ($)   CHOSEN
--------------------------------------------------------------------------------------------------
 Kubernetes (kubernetes-...@kubernetes)   -          2       2         -      0.00          âœ”
--------------------------------------------------------------------------------------------------
Launching a new cluster 'sky-7cb4-vastdata'. Proceed? [Y/n]:

Type y to proceed. Notice the log line says "Mounting cached mode" instead of just "Mounting":

⚙︎ Syncing files.
  Mounting cached mode (to 1 node): vastdata://skypilot -> /data
✓ Storage mounted.

After the job runs, you will also see a cache upload confirmation:

(task, pid=1849) aaa
(task, pid=1849) boto3-test.txt
(task, pid=1849) created_by_goofys
(task, pid=1849) created_by_rclone
(task, pid=1849) hosts
(task, pid=1849) skypilot: cached mount upload complete (took 11s)
✓ Job finished (status: SUCCEEDED).

The line skypilot: cached mount upload complete confirms that any locally-cached writes were flushed back to the VastData bucket.

Step 14: SSH and Inspect the Cached Mount

SSH into the cluster:

ssh sky-7cb4-vastdata

Verify the files are visible:

ls /data/
aaa  boto3-test.txt  created_by_goofys  created_by_rclone  hosts

Inspect the rclone VFS process to see the caching parameters:

ps -aef | grep vfs

Expected output:

sky  1726  1  0 09:52 ?  00:00:00 /usr/bin/rclone mount sky-vastdata-skypilot:skypilot /data \
  --daemon --daemon-wait 10 \
  --log-file /home/sky/.sky/rclone_log/4caa791091d21d23e63637080226f370.log \
  --log-level INFO --allow-other \
  --vfs-cache-mode full \
  --dir-cache-time 10s \
  --vfs-cache-poll-interval 10s \
  --cache-dir /home/sky/.cache/rclone/4caa791091d21d23e63637080226f370 \
  --vfs-fast-fingerprint \
  --vfs-cache-max-size 10G \
  --vfs-write-back 1s

Key rclone VFS flags to note:

Flag

Description

--vfs-cache-mode full

Full read-write caching — all reads and writes go through the local cache

--vfs-cache-max-size 10G

Maximum local cache size before eviction

--vfs-write-back 1s

Writes are flushed to the remote bucket after 1 second of inactivity

--dir-cache-time 10s

Directory listings are cached for 10 seconds

--vfs-fast-fingerprint

Uses size and modification time (not checksums) for faster cache validation

Test 3: Copy Mode (COPY)

The COPY mode downloads the entire bucket contents to the local filesystem at launch time. The files are plain local files — no FUSE mount, no background process. This is the simplest and most compatible mode, ideal when your workload needs fast local I/O, and the dataset fits on disk.

Step 16: Create a Copy Mode Task YAML

Create a YAML file (e.g., test_vast_copy.yaml) that uses COPY mode:

file_mounts:
  /data:
    source: vastdata://skypilot
    mode: COPY

resources:
  cloud: Kubernetes
  cpus: 2

run: |
  ls /data
  • mode: COPY — The bucket contents are downloaded to /data during the file sync phase. There is no live connection to the remote bucket after the copy completes.

Step 17: Launch the Copy Mode Task

sky launch test_vast_copy.yaml

Type y to confirm. Notice the log line says "Syncing" rather than "Mounting":

⚙︎ Syncing files.
  Syncing (to 1 node): vastdata://skypilot -> /data
✓ Synced file_mounts.

Expected job output:

(task, pid=1852) aaa
(task, pid=1852) boto3-test.txt
(task, pid=1852) created_by_goofys
(task, pid=1852) created_by_rclone
(task, pid=1852) hosts
✓ Job finished (status: SUCCEEDED).

Step 18: SSH and Inspect the Copy

SSH into the cluster:

ssh sky-f053-vastdata

Verify the files are present:

ls /data/
aaa  boto3-test.txt  created_by_goofys  created_by_rclone  hosts

Confirm there is no FUSE mount — the files are regular local files on the pod's filesystem:

df | grep /data

This returns no output, confirming /data is not a separate mount point. The files were copied directly into the pod's local filesystem during setup.

Step 19: Clean Up the COPY Cluster

sky down sky-f053-vastdata

Test 4: Auto-Create a New Bucket

SkyPilot can automatically create a new VastData bucket and mount it. This is useful when your task needs a fresh, empty storage location — for example, to write output data or checkpoints.

Step 20: Create a Bucket-Creation Task YAML

Create a YAML file (e.g., test_vast_create_bucket.yaml) that instructs SkyPilot to create a new bucket and mount it:

file_mounts:
  /data:
    name: skypilotnew
    source: ~
    store: vastdata

resources:
  cloud: Kubernetes
  cpus: 2

run: |
  ls /data
  • name: skypilotnew — The name of the new bucket to create on VastData.

  • source: ~ — Indicates that the local home directory contents should be synced (use ~ as a minimal placeholder; SkyPilot will create the bucket even if nothing is uploaded).

  • store: vastdata — Tells SkyPilot to create the bucket on VastData (rather than AWS S3, GCS, etc.).

Step 21: Launch the Task

sky launch test_vast_create_bucket.yaml

Type y to confirm. SkyPilot will create the bucket before launching the cluster:

  Created S3 bucket 'skypilotnew' in auto
⚙︎ Launching on Kubernetes.
└── Pod is up.
✓ Cluster launched: sky-405f-vastdata.
⚙︎ Syncing files.
  Mounting (to 1 node): skypilotnew -> /data
✓ Storage mounted.
✓ Job finished (status: SUCCEEDED).

The bucket is created on the VastData S3 endpoint and then mounted into the pod at /data.

Step 22: Verify with sky storage ls

List all SkyPilot-managed storage to confirm the new bucket exists:

sky storage ls

Expected output:

NAME         UPDATED      STORE     COMMAND                       STATUS
skypilotnew  52 secs ago  VASTDATA  sky launch test_vast_crea...  READY

The bucket is tracked by SkyPilot and can be reused, mounted by other tasks, or deleted with sky storage delete skypilotnew.

Step 23: SSH and Inspect the New Bucket Mount

SSH into the cluster:

ssh sky-405f-vastdata

Verify the bucket is mounted as a FUSE filesystem:

df | grep /data

Expected output:

skypilotnew    1099511627776    0    1099511627776   0% /data

The newly created bucket is mounted and ready for use. Your task's run commands can write data to /data, and it will be stored in the VastData bucket skypilotnew.

Step 24: Clean Up

Tear down the cluster:

sky down sky-405f-vastdata

Optionally, delete the bucket if it is no longer needed:

sky storage delete skypilotnew

Mount Mode Comparison

Aspect

MOUNT

MOUNT_CACHED

COPY

Backend

FUSE (goofys-based)

rclone with VFS cache

rclone sync (one-time download)

Data transfer

On-demand per read

On-demand + local cache

Full download at launch

Read performance

Network-bound (every read)

Fast for repeated reads (cached)

Native local disk speed

Write support

Limited

Full read-write with write-back

Local only (not synced back)

Local disk usage

Minimal

Up to 10 GB cache (configurable)

Full dataset size

Live remote connection

Yes (FUSE process)

Yes (rclone process)

No

Best for

Streaming large files, read-once workloads

Iterative reads, training data, read-write workloads

Small datasets, max I/O performance, offline access

Summary

Step

Action

Result

1

sky api start --host 0.0.0.0

API server + dashboard running on port 46580

2

sky check kubernetes

Kubernetes enabled for compute

3

sky check vastdata

Shows what VastData config is missing

4

pip install boto3

S3 client library installed

5

aws configure --profile vastdata

Access credentials stored

6

aws configure set endpoint_url ...

S3 endpoint configured

7

sky check vastdata

VastData enabled for storage

8

Create test_vast_mount.yaml

Task YAML with VastData FUSE mount

9

sky launch test_vast_mount.yaml

Cluster launched, bucket mounted, job succeeded

10

ssh sky-19aa-vastdata

Interactive access to inspect the FUSE mount

11

sky down sky-19aa-vastdata

MOUNT cluster torn down

12

Create test_vast_cache.yaml

Task YAML with VastData cached mount

13

sky launch test_vast_cache.yaml

Cluster launched, cached mount active, job succeeded

14

ssh sky-7cb4-vastdata

Interactive access to inspect rclone VFS cache

15

sky down sky-7cb4-vastdata

MOUNT_CACHED cluster torn down

16

Create test_vast_copy.yaml

Task YAML with VastData copy mode

17

sky launch test_vast_copy.yaml

Cluster launched, files synced, job succeeded

18

ssh sky-f053-vastdata

Interactive access — no FUSE mount, plain local files

19

sky down sky-f053-vastdata

COPY cluster torn down

20

Create test_vast_create_bucket.yaml

Task YAML that auto-creates a new VastData bucket

21

sky launch test_vast_create_bucket.yaml

Bucket created, cluster launched, mounted

22

sky storage ls

New bucket visible in SkyPilot storage list

23

ssh sky-405f-vastdata

Bucket mounted as FUSE filesystem at /data

24

sky down / sky storage delete

Cluster torn down, optionally delete bucket