Troubleshoot SCARCE Mode on VAST Data

OVERVIEW

The cluster has three remaining_stripes_state:

ABUNDANT - A good state, where the cluster has more than 15% available stripes.
SCARCE - When a cluster reaches the point where it is left with less than 15% available stripes.
HALT_WRITES - When a cluster reaches the point where it is left with less than 5% available stripes. At this point, the cluster state is read-only.

SCARCE Mode

remaining_stripes_state=RemainingStripesState.SCARCE

In such cases, the cluster should try to defrag stripes such that the available stripe count increases rapidly.

DETECTION

The cluster should raise the alarm for a remaining_stripes_state change (which is also configured to send a call home message):

VastData-IT: Bonzo-01: [CRITICAL] Cluster Bonzo-01 md_usage_state changed from ABUNDANT to SCARCE - 2022-08-30 08:19:47 UTC

To manually detect the current stripes_state:

On one of the CNodes in the cluster, run:

vtool system_status | grep remaining_stripes_state

Look for the highlighted remaining_stripes_state state:

remaining_stripes_state=RemainingStripesState.SCARCE

This means we are in SCARCE mode, and the cluster should prioritize DEFRAG.

ADDITIONAL TROUBLESHOOTING

The expected defrag rate should be ~2GB/Sec per CNode. If we see that the stripe release rate is significantly slower (40%-50% slower), it's worth checking for an over-utilized silo.

Some background: Every stripe has 1024 sub-stripes, and unless we defrag all sub-stripes, the stripe remains marked for defrag.

Once it is done, it will be marked as done.

Here is how to see which shard is the busiest:

clush -g cnodes '/vast/data/bashdocker.sh hubble -fuf mark_done -fnf raid_maint -c HIGH -s "60 minutes ago" | grep raid_maintenance_type=0' > mark_done &
grep stripe_done=1 mark_done | grep -o "shard_id=[0-9]* " | sort | uniq -c | sort -nr | less

This Hubble should give us a list of the stripes marked done.

Here is an example:

107 shard_id=161
76 shard_id=38
55 shard_id=150
50 shard_id=24
18 shard_id=53
3 shard_id=295
2 shard_id=80
2 shard_id=227
2 shard_id=201
1 shard_id=510

Above, we can see that shard id 161 is working the hardest, followed by 38, 150, 24, and 53.

We can list the shards using:

vtool list_shards ESTORE

The output should look like this:

cnode-8 700826e1-f31a-5f56-bcd9-60fe33531986 1 [421, 192, 406, 90]
cnode-7 34b56d34-2e57-5202-aefd-14691d795f45 1 [77, 46, 413, 439, 444]
cnode-3 e14b4aa5-f195-5c66-b949-a68338531b48 3 [129, 350, 161, 91, 61]
cnode-2 34bee4dd-f76e-5e87-9e42-5e7752bdddbb 0 [403, 299, 154, 315]
cnode-7 34b56d34-2e57-5202-aefd-14691d795f45 11 [390, 164, 255, 56, 125]
cnode-3 e14b4aa5-f195-5c66-b949-a68338531b48 11 [332, 63, 233, 475, 150]

To get the stripe groups, use the following:

VAST vastdata@cnode_1 userdata:$ vtool list_shards SG
cnode-6 94d0095e-8336-504d-a86d-8f6eba519f8a 3 [34, 44]
cnode-7 34b56d34-2e57-5202-aefd-14691d795f45 5 [63, 7]
cnode-7 34b56d34-2e57-5202-aefd-14691d795f45 6 [9, 4]
cnode-1 ef3717ad-ec50-5e1d-ad46-6e009e62e950 3 [22, 41]
cnode-6 94d0095e-8336-504d-a86d-8f6eba519f8a 4 [5, 53]
cnode-5 1d016ef0-e2b8-5da9-b7a1-aa30346fe826 5 [36, 14]
cnode-4 2763dbce-d196-5cce-99f3-2ba1fb47a126 3 [62, 59]
cnode-5 1d016ef0-e2b8-5da9-b7a1-aa30346fe826 4 [45, 40]
cnode-7 34b56d34-2e57-5202-aefd-14691d795f45 4 [25, 16]
cnode-2 34bee4dd-f76e-5e87-9e42-5e7752bdddbb 4 [29, 11]
cnode-3 e14b4aa5-f195-5c66-b949-a68338531b48 4 [39, 61]
cnode-6 94d0095e-8336-504d-a86d-8f6eba519f8a 5 [56, 31]
cnode-3 e14b4aa5-f195-5c66-b949-a68338531b48 5 [54, 48]
cnode-1 ef3717ad-ec50-5e1d-ad46-6e009e62e950 6 [10, 60]
cnode-1 ef3717ad-ec50-5e1d-ad46-6e009e62e950 4 [42, 26]
cnode-4 2763dbce-d196-5cce-99f3-2ba1fb47a126 6 [1, 6]
cnode-3 e14b4aa5-f195-5c66-b949-a68338531b48 6 [35, 28]
cnode-1 ef3717ad-ec50-5e1d-ad46-6e009e62e950 5 [38, 33]
cnode-2 34bee4dd-f76e-5e87-9e42-5e7752bdddbb 3 [50, 52]
cnode-8 700826e1-f31a-5f56-bcd9-60fe33531986 6 [30, 37]
cnode-3 e14b4aa5-f195-5c66-b949-a68338531b48 3 [32, 12]
cnode-7 34b56d34-2e57-5202-aefd-14691d795f45 3 [2, 23]
cnode-8 700826e1-f31a-5f56-bcd9-60fe33531986 3 [24, 8]
cnode-2 34bee4dd-f76e-5e87-9e42-5e7752bdddbb 6 [15, 58]
cnode-5 1d016ef0-e2b8-5da9-b7a1-aa30346fe826 6 [55, 13]
cnode-4 2763dbce-d196-5cce-99f3-2ba1fb47a126 5 [43, 57]
cnode-8 700826e1-f31a-5f56-bcd9-60fe33531986 5 [18, 46]
cnode-6 94d0095e-8336-504d-a86d-8f6eba519f8a 6 [47, 27]
cnode-2 34bee4dd-f76e-5e87-9e42-5e7752bdddbb 5 [20, 17]
cnode-5 1d016ef0-e2b8-5da9-b7a1-aa30346fe826 3 [21, 19]
cnode-8 700826e1-f31a-5f56-bcd9-60fe33531986 4 [49, 0]
cnode-4 2763dbce-d196-5cce-99f3-2ba1fb47a126 4 [51, 3]

This can give us an idea about which CNode is doing the most work.

Restarting the CNode (disable + enable) will force the shards to redistribute across the other CNodes, balancing the workload.

NOTE -- In case we need to involve VFORCE, please collect all the following information to make the escalation productive:

clush -g cnodes '/vast/data/bashdocker.sh hubble -fuf mark_done -fnf raid_maint -c HIGH -s "60 minutes ago" | grep raid_maintenance_type=0' > mark_done &
grep stripe_done=1 mark_done | grep -o "shard_id=[0-9]* " | sort | uniq -c | sort -nr

vtool list_shards ESTORE
vtool list_shards SG