Documentation Index

Fetch the complete documentation index at: https://kb.vastdata.com/llms.txt

Use this file to discover all available pages before exploring further.

VAST Data Drives Tool

Prev Next

Overview


The VAST Data Drives Management Power Tool enables safe, controlled firmware upgrade and power cycling of individual drives within your VAST Data cluster.
By operating one drive at a time and continuously monitoring system health, it ensures cluster stability while performing maintenance or fault-isolation tasks.

Procedure

Download the vast_drives_util.py script to a VAST CNode, and place it in /vast/data directory.

vast_drives_util.py

Required

  • Start screen to avoid disconnection issues.

  • Copy the script to a VAST CNode and place it in /vast/data.

  • The tool must be started within a VAST container.

/vast/data/bashdocker.sh

Basic Usage

```bash
python ssd_power_cycle.py
```

This will:

  1. Map all drives in the system.

  2. Verify system health.

  3. Power cycle drives one by one.

  4. Wait for RAID to become healthy after each cycle.

  5. Log all activities to /vast/log/ssd_cycle.log.

Command Line Options

usage: vast_drives_util.py [-h] [--log-file LOG_FILE] [--wait-time WAIT_TIME]
                           [--verbose] [--force] [--reset] [--resume]
                           [--dry-run] [--save-mapping FILE]
                           [--load-mapping FILE] [--list] [--drives GUID_LIST]
                           [--skip-failed-drives GUID_LIST] [--yes]
                           [--skip-denylist-check] [--fw-upgrade DRIVE_MODEL]
                           [--fw-file FIRMWARE_FILE]
                           [--target-fw-version TARGET_FW_VERSION] [--nvrams]
                           [--upgrade-only] [--upload-reports]

VastData Drive Power Cycling And FW Upgrade Tool

optional arguments:
  -h, --help            show this help message and exit
  --log-file LOG_FILE, --log LOG_FILE
                        Path to log file
  --wait-time WAIT_TIME
                        Wait time in seconds after powering on drives
                        (default: 30)
  --verbose             Enable verbose logging
  --force               Continue on errors
  --reset               Reset state and start from scratch
  --resume              Resume from last state (default behavior if state
                        exists)
  --dry-run             Run without making any actual changes (simulates
                        actions)
  --save-mapping FILE   Save drive mapping to specified file
  --load-mapping FILE   Load drive mapping from specified file
  --list                List all drives and exit
  --drives GUID_LIST    Specify drives to cycle (comma-separated GUIDs or
                        indices)
  --skip-failed-drives GUID_LIST
                        Specify drives to skip even if in failed state (comma-
                        separated GUIDs)
  --yes, -y             Assume yes for all prompts
  --skip-denylist-check
                        Skip all denylist checks
  --fw-upgrade DRIVE_MODEL
                        Enable firmware upgrade for drives matching this model
                        (e.g., "SSDPFWNV153TZ")
  --fw-file FIRMWARE_FILE
                        Path to firmware file for upgrade
  --target-fw-version TARGET_FW_VERSION
                        Target FW Version being upgraded to
  --nvrams              Perform actions on NVRAMs instead of SSDs
  --upgrade-only        Only run FW Upgrade, don't run cycle
  --upload-reports      Upload Drive reports to s3 when done

### Examples

**Standard run with default settings:**
```bash
python ssd_power_cycle.py
```

**Run with verbose logging to a custom log file:**
```bash
python ssd_power_cycle.py --verbose --log /path/to/custom_log.log
```

**Continue despite errors:**
```bash
python ssd_power_cycle.py --force
```

**Reset progress and start over:**
```bash
python ssd_power_cycle.py --reset
```

**Explicitly resume from previous run:**
```bash
python ssd_power_cycle.py --resume
```

**Do a dry run without making actual changes:**
```bash
python ssd_power_cycle.py --dry-run
```

** Run firmware upgrade
```bash
cd /vast/data;
/vast/data/bashdocker.sh python3 ssd_cycle.py --fw-upgrade SSDPFWNV153TZ --fw-file /vast/data/ACV10203_WFEM01S0_signed.bin --verbose
```

How It Works

The tool follows a systematic approach to safely power cycle each drive.

  1. Initialization:

    • Connects to the VastData cluster leader.

    • Loads any previous state (for resuming interrupted operations).

    • Maps all drives to their physical locations.

  2. Pre-cycle Validation:

    • Verifies RAID health.

    • Checks for node or drive failures.

  3. For Each Drive:

    • Deactivates the drive using VastData API.

    • Power cycles the drive via SSH to the appropriate DNode.

    • Reactivates the drive using VastData API.

    • Verifies that no new failures occurred.

    • Waits for RAID to become healthy.

    • Updates progress and state.

  4. Monitoring:

    • Displays progress with completion percentage.

    • Shows estimated time remaining.

    • Logs all activities.

State Management

The tool maintains its state in a JSON file at /vast/log/ssd_cycle_state.json. This allows it to be safely stopped and resumed from the same point.

To start fresh, use the --reset flag to clear the state file.

By default, the tool will resume from the last saved state if it exists. You can also explicitly request to resume using the --resume flag, which will check if a state file exists before proceeding.

Logging

Detailed logs are written to /vast/log/drive_cycle.log by default. For troubleshooting, enable verbose logging with the --verbose flag.

Error Handling

By default, the tool stops on any DNode\Drive\Denylist\general errors to prevent potential issues.