Options for Getting Disk Usage on VAST Data

Prev Next

For this kind of workload (one small operation at a time), a local file system will usually outperform a shared file system on the network.
There are a few strategies for gathering capacity information:

  1. Use a parallel DU implementation (https://github.com/Byron/dua-cli )

    1. dua (-> Disk Usage Analyzer) is a tool to conveniently learn about the usage of disk space of a given directory. It's parallel by default and will max out your SSD, providing relevant information as fast as possible.

  2. Use block storage on VAST Data. The block protocol has no different latency than NFS, but the client is allowed to have aggressive caching with a local file system sitting on a remote block device.

  3. Get the information from VAST Data, which knows the data size per directory. This solution is the fastest, and it scales to billions of files.

Here's a guide for how to get capacity information from VMS:

$ pip install vastpy
$ export VMS_USER=admin VMS_PASSWORD=123456 VMS_ADDRESS=vast-file-server-vms-kfs2
$ vastpy-cli get --json capacity path=/bonzo/users/alonho
{
  "details": [
    [
      "/bonzo/users/alonho",
      {
        "data": [
          164714614744,
          44616196110,
          662430925429
        ],
        "parent": "/bonzo/users",
        "percent": 100,
        "average_atime": "2024-04-23 13:17"
      }
    ]
  ],
  "keys": [
    "usable",
    "unique",
    "logical"
  ],
  "time": "2025-03-13 06:42:28",
  "sort_key": "usable",
  "root_data": [
    971276700179058,
    796623499846425,
    2257076215609176
  ],
  "small_folders": []
}

usable = physical space consumed
unique = how much space you reclaim from deleting this directory (dedup would result in higher ratio of logical to usable but also results in less reclamation if the same data exists in other directories).

If you also want file counts, you need a quote:

$ vastpy-cli get quotas fields=path,used_capacity_tb,used_inodes,soft_limit,hard_limit,soft_limit_inodes,hard_limit_inodes
path         |soft_limit    |hard_limit     |soft_limit_inodes |hard_limit_inodes |used_inodes |used_capacity_tb
-------------+--------------+---------------+------------------+------------------+------------+-----------------+
/alon/foo    |5000          |10000          |None              |None              |1           |0.0
/            |None          |None           |None              |None              |1           |0.0
/            |None          |None           |None              |None              |1           |0.0
/alontest    |100000000000  |120000000000   |10000             |20000             |3           |0.041
/bla_roy     |None          |None           |None              |None              |1           |0.0
/checkpoints |90000000000   |100000000000   |None              |None              |1           |0.0
/CS-HomeDirs |9000000000000 |10000000000000 |None              |None              |496453      |0.251
/cs_vm_store |None          |2000000000000  |None              |None              |1088        |1.927
/datasets    |None          |10000          |None              |None              |1           |0.0
/datasets    |None          |10000          |None              |None              |1           |0.0
/projects/db |None          |10000          |None              |None              |1           |0.0
/fs2         |None          |20000000       |None              |None              |1           |0.0
/            |None          |None           |None              |None              |1272840819  |2219.625
/bonzo/roy   |None          |None           |None              |10000000          |181470      |0.006
$ vastpy-cli get quotas path=/alon/foo fields=path,used_capacity_tb,used_inodes,soft_limit,hard_limit,soft_limit_inodes,hard_limit_inodes
path      |soft_limit |hard_limit |soft_limit_inodes |hard_limit_inodes |used_inodes |used_capacity_tb
----------+-----------+-----------+------------------+------------------+------------+-----------------+
/alon/foo |5000       |10000      |None              |None              |1           |0.0

You can create a read-only user in VMS to share with end users if needed.

Byron/dua-cli - A tool to conveniently learn about the disk usage of directories, fast!.

Website https://lib.rs/crates/dua-cli