This procedure must be performed by or with the assistance of the VAST Customer Success team.
Note
For CERES DBoxes, the complete procedure for replacing a DBox features some additional and variant steps and cautions not included below.
DBox replacement is a VMS-enabled procedure for replacing a faulty DBox while the cluster continues to operate.
This DBox replacement procedure is suitable for the following situations:
When a DBox has failed in a cluster that has DBox HA capability. The cluster is still running.
When a DBox is faulty but running. For example, the DBox has a failed slot. Even if DBox HA is not enabled, the cluster is still running, since the DBox has not failed.
A replacement DBox is shipped with new DNodes and empty of SSDs and SCMs. During the procedure, the SSDs and SCMs are migrated from the faulty DBox to the new DBox.
Prerequisites
The procedure requires you to connect the new DBox to the switches before disconnecting the old DBox. Therefore, the cluster's network switches must have enough spare unused ports to accommodate an extra DBox.
Please consult your VAST Data sales engineer for help designating switch ports and ensuring that they are configured with the correct port designations for DNodes as required.
Similarly, you'll need rack space and PSUs in order to install the new DBox before physically removing the faulty DBox.
Required Equipment
Replacement DBox with rail mount kit and four C13/C14 power cables. All SSD slots on the DBox must be empty.
4 x 100Gb/s QSFP28 cables for connecting the new DBox to the cluster's switches.
Step 1: Install and Add the Replacement DBox
Without removing the faulty DBox, rack mount the new DBox and add the new DBox to the cluster. Follow the instructions in this cluster expansion procedure to add the DBox to the cluster. Make sure to select Empty box in the General Settings screen.
Step 2: Begin DBox Removal
On the DBoxes tab, right-click the faulty DBox that you want to replace and select Replace.
Click Yes to confirm your action.
Step 3: Migrate the SSDs to the New DBox
On the Clusters tab of the Infrastructure page, check that the cluster's Raid State is healthy.
Prepare to move SSDs from the old DBox into the new DBox. Plan to insert each SSD into the slot in the new DBox that has the same slot number as in the old DBox.
Migrate each SSD, one at a time, as follows:
Remove the SSD from the faulty DBox.
The SSD's state changes to Failed and the cluster's RAID state changes to Rebuild.
Insert the removed SSD into the target slot in the new DBox.
The SSD is activated automatically.
Verify that the cluster's RAID state has returned to healthy before proceeding with the next SSD.
Step 4: Migrate the SCMs to the New DBox
Prepare to move SCMs from the old DBox into the new DBox. Plan to insert each SCM into the slot in the new DBox that has the same slot number as in the old DBox.
On the Clusters tab of the Infrastructure page, check that the cluster's SCM State is healthy.
For each SCM in turn:
In the SCMs tab, right-click the SCM and select Deactivate.
When the SCM is deactivated, remove the SCM from the faulty DBox.
Insert it into the target slot in the new DBox. In case of a faulty SCM, insert the replacement SCM into the planned slot.
Verify that the slot is active and the device is healthy.
Right-click the moved SCM and select Activate.
Verify that the cluster's SCM State is healthy before proceeding with the next SCM.
Step 5: Remove the Faulty DBox
Verify that the faulty DBox is empty of devices.
Right-click the faulty DBox and select Conclude Replacement.
Click Yes to confirm the replacement.
The process of removing the DBox and DNodes takes some time. You can monitor the progress by watching the replace_dbox task in the Activitiies page.
Wait until the task is complete and then physically remove the faulty DBox. Ship it back to VAST Data.