VAST Probe Requirements

Prev Next

Audience

This guide helps understand the Probe hardware and software requirements. This is intended for customers who are running the Probe on their own infrastructure. You can learn how to quickly run the Probe by following the instructions in VAST Probe Quickstart

Hardware Minimum Requirements

Actual hardware requirements depend on the amount of data to be scanned. Examples of how to scope hardware based on dataset size are provided at the end of this page.

  • 16 CPU cores or higher Intel Broadwell-compatible or later CPUs.

    • The Probe requires CPU instructions that are not available on older CPUs.

    • The Probe will run virtually on Intel-based hardware with a Virtual Cluster vMotion minimum compatibility of Intel Broadwell or later.

    • The Probe has not been evaluated on AMD CPUs.

  • 128 GB RAM or higher 

    • The probe consumes almost 100GB of RAM upon launch.

    • The more RAM, the better the Probe will perform and the more data it can scan.

  • 10 GbE Networking or higher

  • 50 GB SSD-backed local storage or higher (NVMe or FC/iSCSI LUNs).

    • This local SSD capacity is needed for the database the probe builds and logging.

    • Must be equivalent to 0.6% of the data to be scanned.

    • Disk storage must have very high sustained IOPs.

    • The larger the local SSD allocated, the more data can be scanned.

    • Local SSD filesystem should be ext4 or XFS.

Operating System Minimum Requirements

 We've tested the following, but most modern Linux distributions should be fine:

  • Ubuntu 18.04, 20.04

  • CentOS/RHEL 7.4+

  • Rocky/RHEL 8.3+ 

Software Requirements

  • Docker: 17.05 +

  • python3 (for launching the Probe)

  • screen (for running the Probe in the background)

  • wget (for downloading the Probe image) 

Filesystem Requirements (For Probing For Data Reduction)

Be aware that if the filesystem has atime enabled, any scanning, even while mounted as read-only, will update the atime clock.

  • NFS: The Probe host has been provided root-squash and read-only access

    • For faster scanning, use an operating system that has nconnect support:

      • Ubuntu 20.04+

      • RHEL/Rocky 8.4+

  • Lustre: The Probe host and container must be able to read as a root user

  • GPFS: The Probe host and container must be able to read as a root user

  • SMB: The Probe host should be mounted with a user in the BUILTIN\Backup Operators group to avoid file access issues. 

  • S3/Object: We have tested internally with Goofys as a method of imitating a filesystem

    • It is not recommended to scan anything in AWS Glacier or equivalent

Hardware Requirement Examples

Example A: You have a server with 768GB of RAM:

  • 154GB is for the Operating System, leaving 614GB of RAM...

  • There are 100 million files to scan, which will occupy ~5GB of RAM, leaving 609GB of RAM...

    • 50-bytes per 'filename'.

  • This leaves 609GB of RAM available for the RAM index.

    • --ram-index-size-gb 609
    • This can scan up to 99TB of data using only RAM, with no significant local SSD space required.

      • This calculation is based on a 0.6% rule to accommodate similarity and deduplication hashes.

  • Using a disk index, you can scan far more data, and the file count could exceed 10 billion with a 500GB file name cache.

Example B: You have a server with 128GB of RAM and a Local SSD:

  • 26GB is for the Operating System, leaving 102GB of RAM...

  • There are 100 million files to scan, which will occupy ~5GB of RAM, leaving 97GB of RAM...

    • 50-bytes per 'filename'.

  • This leaves 97GB of RAM available for the RAM index

    • --ram-index-size-gb 97
    • This can scan up to 15TB of data using only RAM, with no significant local SSD space required.

      • This calculation is based on a 0.6% rule to accommodate similarity and deduplication hashes.

  • Using a disk index, you can scan far more data, and the file count could be as high as 2 billion, with a 100GB file name cache.

    • 15TB of data requires 90GB of local SSD disk.

    • 100TB of data requires 600GB of local SSD disk.

Algorithm Specification 

Here's pseudo-code which helps to explain how these calculations are done:


available_ram_bytes = (avail_b * 0.8) - (n_files * 50)
 ram_index_size = args.ram_index_size_gb * GB
 disk_index_size = args.disk_index_size_gb * GB
 if disk_index_size == 0:
 if ram_index_size == 0 and available_ram_bytes > index_size:
 ram_index_size = index_size
 if ram_index_size == 0:
 disk_index_size = index_size
 if 0 < ram_index_size < GB:
 ram_index_size = GB
 if 0 < disk_index_size < GB:
 disk_index_size = GB