VAST Probe Quickstart

Prev Next

Prerequisites

  • Linux OS (Ubuntu 20/CentOS 8 Recommended)

    • Ubuntu 18 & CentOS 7 Minimum Required

  • python3

  • Docker

  • screen

  • wget

Sizing The Probe Hardware or Virtual Machine

Review VAST Probe Requirements

  • RAM is reserved for the operating system and the probe runtime image.

    • 20% is set automatically aside by the probe launcher.

  • RAM is also used for the file name cache.

    • This is 50 bytes per file scanned.

  • RAM or SSD-backed local disk is used for a hash database.

    • The hash database is where the pros scanned blocks.

    • The hash database should be at least 0.6% of the data size.

    • If there is sufficient RAM, the probe will use RAM for the hash database.

    • If there is insufficient RAM, the probe will use an SSD-backed local disk for the hash database.

  • Any remaining unallocated RAM will be used for a read cache.

Download Probe Bundle

To download the probe, refer to the instructions in Downloading the VAST Probe

If you do not have access to the instructions, please contact your VAST representative. They can provide download instructions.

 

Expand & Verify Download

Now that you've downloaded the probe, you'll need to untar it and verify the download.

export PROBE_BUILD=935553
tar -xzf ${PROBE_BUILD}.probe.bundle.tar.gz
ls -l

 

Note: image may not show current build numbers.

The provided terminal screenshot shows that the user has exported `PROBE_BUILD=513711`, extracted a `.tar.gz` file using this build number, and listed its contents to confirm the extraction was successful, including files such as `probe.bundle.tar.gz`, `probe.image.gz`, and `probe_launcher.py`.

Mount Filesystems Selected to Be Probed

Validated Filesystems Include, But Are Not Limited To:

  • NFS

  • Lustre

  • GPFS

  • S3 with goofys

  • CIFS/SMB

For the most accurate results, do not use root-squash

It's recommended to set read-only access on the mounted filesystem

Create Probe Directories

Change /mnt/ to the SSD-backed local disk to be used by the probe for the hash database and logging directories

sudo mkdir -p /mnt/probe/db
sudo mkdir -p /mnt/probe/out
sudo chmod -Rf 777 /mnt/probe

Size of the Data Set

  • The input to the probe is a defined directory (--input-dir)

  • The probe will automatically query the input filesystem about space consumed and file count (inodes), and use that in its calculations

  • Depending on the method of mounting and underlying storage, this can often provide an inaccurate query response 

  • It's highly recommended that manual estimated entries be defined for space consumed (--data-size-gb) and file count (--number-of-files

  • These estimates do not have to be accurate; round up reasonably

Running The Probe

The probe runs as a foreground application. This means that if your session is closed for whatever reason, the probe will stop. It's recommended to run the probe as a screen session.

Here is an example of a command line. Edit the bold variables for the environment:

NOTE: Use underscores instead of spaces in COMPANY_NAME and WORKLOAD

export DB_DIR=/mnt/probe/db
export OUTPUT_DIR=/mnt/probe/out
export INPUT_DIR=/mnt/filesystem_to_be_probed/sub_directory
export INPUT_SIZE_GB=10000
export QTY_FILES=1000000
export COMPANY_NAME=Your_Amazing_Company
export WORKLOAD=Describe_Your_Workload

Start the probe: (This may take up to five minutes to start displaying output)

sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir $INPUT_DIR \
--metadata-dir $DB_DIR \
--output-dir $OUTPUT_DIR \
--data-size-gb $INPUT_SIZE_GB \
--number-of-files $QTY_FILES \
--customer-name ${COMPANY_NAME}---${WORKLOAD}

 

Example One: Small Data Sets

To probe the directory interesting_data of 15 TB in-use and 5,000,000 files at the company ACME, the command would be:

sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir /mnt/acme_filer/interesting_data \
--metadata-dir /mnt/data/probe/db \
--output-dir /mnt/data/probe/out \
--data-size-gb 15000 \
--number-of-files 5000000 \
--customer-name ACME---Interesting_Data

Example Two: Larger Data Sets

To probe the directory fascinating_data of 60 TB in-use and 750,000,000 files at the company FOO, and are using defined parameters for RAM and SSD-backed local disk, the command would be:

sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir /mnt/foo_filer/fascinating_data \
--metadata-dir /mnt/data/probe/db \
--output-dir /mnt/data/probe/out \
--data-size-gb 60000 \
--number-of-files 750000000 \
--customer-name FOO---Facinating_Data

Example Three: Performance Throttling

To probe the directory riviting_data of 250 TB in-use and 1,250,000,000 files at the company Initech, using defined parameters for RAM and SSD-backed local disk, but wishing to have a lower performance impact on the filesystem, the command would be:

sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir /mnt/initech_filer/riviting_data \
--metadata-dir /mnt/data/probe/db \
--output-dir /mnt/data/probe/out \
--data-size-gb 250000 \
--number-of-files 1250000000 \
--number-of-threads 4
--customer-name Initech---Riviting_Data

Note the --number-of-threads flag. By default, the probe uses all CPU cores in the system, but this can be used to throttle performance and reduce the potential impact on the scanned filesystem.

Other Probe Flags

While the probe is running and after completion, telemetry logs are automatically uploaded to VAST. To prevent this, add the following flag:

--dont-send-logs \

If you wish to send file names with the default telemetry logs, add the following flag:

--send-logs-with-file-names \

Probing filesystems that contain snapshots can often cause recursion issues and inaccurate results. As a result, the probe automatically ignores directories named .snapshot. If your file system uses another convention, use the --regexp-filter command. If, for some reason, you want the probe to read the .snapshot directories, specify false rather than true for --filter-snapshots.

--filter-snapshots \    (this is the default)

Adaptive chunking was introduced with VAST 4.3 and this latest ( ) probe version. Under most circumstances, the probe should be run with adaptive chunking. However, you can disable that feature by specifying this flag:

--disable-adaptive-chunking \

Understanding the Results

Once started, the probe will display the current projected data reduction. Once completed, the probe will display output and is further described in Understanding VAST Probe Output

Re-Running The Probe

The hash database must be empty before running the probe again:

sudo rm -r /mnt/probe/db/*

Troubleshooting

Refer to the VAST Probe Troubleshooting and contact your VAST System Engineer for assistance.