Overview
The VAST Data Drives Management Power Tool enables safe, controlled firmware upgrade and power cycling of individual drives within your VAST Data cluster.
By operating one drive at a time and continuously monitoring system health, it ensures cluster stability while performing maintenance or fault-isolation tasks.
Procedure
Download the vast_drives_util.py script to a VAST CNode, and place it in /vast/data directory.
Required
Start
screento avoid disconnection issues.Copy the script to a VAST CNode and place it in
/vast/data.The tool must be started within a VAST container.
/vast/data/bashdocker.shBasic Usage
```bash
python ssd_power_cycle.py
```This will:
Map all drives in the system.
Verify system health.
Power cycle drives one by one.
Wait for RAID to become healthy after each cycle.
Log all activities to
/vast/log/ssd_cycle.log.
Command Line Options
usage: vast_drives_util.py [-h] [--log-file LOG_FILE] [--wait-time WAIT_TIME]
[--verbose] [--force] [--reset] [--resume]
[--dry-run] [--save-mapping FILE]
[--load-mapping FILE] [--list] [--drives GUID_LIST]
[--skip-failed-drives GUID_LIST] [--yes]
[--skip-denylist-check] [--fw-upgrade DRIVE_MODEL]
[--fw-file FIRMWARE_FILE]
[--target-fw-version TARGET_FW_VERSION] [--nvrams]
[--upgrade-only] [--upload-reports]
VastData Drive Power Cycling And FW Upgrade Tool
optional arguments:
-h, --help show this help message and exit
--log-file LOG_FILE, --log LOG_FILE
Path to log file
--wait-time WAIT_TIME
Wait time in seconds after powering on drives
(default: 30)
--verbose Enable verbose logging
--force Continue on errors
--reset Reset state and start from scratch
--resume Resume from last state (default behavior if state
exists)
--dry-run Run without making any actual changes (simulates
actions)
--save-mapping FILE Save drive mapping to specified file
--load-mapping FILE Load drive mapping from specified file
--list List all drives and exit
--drives GUID_LIST Specify drives to cycle (comma-separated GUIDs or
indices)
--skip-failed-drives GUID_LIST
Specify drives to skip even if in failed state (comma-
separated GUIDs)
--yes, -y Assume yes for all prompts
--skip-denylist-check
Skip all denylist checks
--fw-upgrade DRIVE_MODEL
Enable firmware upgrade for drives matching this model
(e.g., "SSDPFWNV153TZ")
--fw-file FIRMWARE_FILE
Path to firmware file for upgrade
--target-fw-version TARGET_FW_VERSION
Target FW Version being upgraded to
--nvrams Perform actions on NVRAMs instead of SSDs
--upgrade-only Only run FW Upgrade, don't run cycle
--upload-reports Upload Drive reports to s3 when done### Examples
**Standard run with default settings:**
```bash
python ssd_power_cycle.py
```
**Run with verbose logging to a custom log file:**
```bash
python ssd_power_cycle.py --verbose --log /path/to/custom_log.log
```
**Continue despite errors:**
```bash
python ssd_power_cycle.py --force
```
**Reset progress and start over:**
```bash
python ssd_power_cycle.py --reset
```
**Explicitly resume from previous run:**
```bash
python ssd_power_cycle.py --resume
```
**Do a dry run without making actual changes:**
```bash
python ssd_power_cycle.py --dry-run
```
** Run firmware upgrade
```bash
cd /vast/data;
/vast/data/bashdocker.sh python3 ssd_cycle.py --fw-upgrade SSDPFWNV153TZ --fw-file /vast/data/ACV10203_WFEM01S0_signed.bin --verbose
```How It Works
The tool follows a systematic approach to safely power cycle each drive.
Initialization:
Connects to the VastData cluster leader.
Loads any previous state (for resuming interrupted operations).
Maps all drives to their physical locations.
Pre-cycle Validation:
Verifies RAID health.
Checks for node or drive failures.
For Each Drive:
Deactivates the drive using VastData API.
Power cycles the drive via SSH to the appropriate DNode.
Reactivates the drive using VastData API.
Verifies that no new failures occurred.
Waits for RAID to become healthy.
Updates progress and state.
Monitoring:
Displays progress with completion percentage.
Shows estimated time remaining.
Logs all activities.
State Management
The tool maintains its state in a JSON file at /vast/log/ssd_cycle_state.json. This allows it to be safely stopped and resumed from the same point.
To start fresh, use the --reset flag to clear the state file.
By default, the tool will resume from the last saved state if it exists. You can also explicitly request to resume using the --resume flag, which will check if a state file exists before proceeding.
Logging
Detailed logs are written to /vast/log/drive_cycle.log by default. For troubleshooting, enable verbose logging with the --verbose flag.
Error Handling
By default, the tool stops on any DNode\Drive\Denylist\general errors to prevent potential issues.