Resolved Issues in 5.0.0-SP24

Install & Upgrade

ORION-173226: Updated the logic behind EKM port validation to allow specifying port 443 when creating a cluster with external key management through Thales Group CipherTrust Data Security Platform.
ORION-163100: Enhanced pre-upgrade validations to prevent failing the upgrade with the failed to ready node before adding to cluster error in case the CNodes have the same product serial.
ORION-166035: Enhanced the validation of DNode drives during cluster deployment to ensure that it reports all of the slots where a device is missing or has a different size.
ORION-151559: Enhanced upgrade procedures to prevent upgrade failures in situations when cluster nodes are running different VAST Cluster versions (which can be due to a previous failed upgrade, for example).
ORION-147505: Improved handling of downloaded release packages so that the user can retry installation without the need to repeat the package download process.

Cluster Expansion

ORION-168062: Updated cluster expansion routines to eliminate a failure when trying to add a CERES DBox with the Ceres many files target SCM layout, while the previous layout was Many files.
ORION-168057: Updated cluster expansion processing to eliminate a flow where previous failed attempts to add a DBox could cause multiple CNode container restarts due to UNKNOWN DNode position.
ORION-158006: Provided a field to specify the new IPMI gateway during the cluster expansion procedure so that VMS does not default to the old IPMI gateway, which could cause the DBox add task to fail.
ORION-115070: Resolved an issue that prevented completion of the cluster expansion procedure (initiated through VAST Web UI) after adding a new DBox to the cluster.

Networking

ORION-174952: Updated VLAN validations to eliminate an issue that could cause a VLAN NOT AVAILABLE IN IB MODE: ObjectModifyResultCode.VLAN_IN_IB_UNSUPPORTED error when trying to add a VLAN tag to a new or existing virtual IP pool on a cluster where all internal ports are InfiniBand.
ORION-134785: Updated switch polling routines to eliminate an issue where the cluster raised failed to run mlag_ports, list index out of range alerts while no MLAG ports were configured.
ORION-129171: Added a capability to allow using ports other than 636 when configuring LDAP on the VAST cluster.

Element Store

ORION-182747: Updated the LDAP caching mechanism to resolve an issue that could cause CNodes to restart with the Buffers pool is exhausted num_blocks=4000 current size=4000 error.
ORION-182716: Updated the logic of adaptive chunking to eliminate an issue that could cause an assertion failed: ((batch->get_post_merge_size()) <= (_data2serials.get_max_num_serials_in_batch_before_forced_split())) error followed by an ESTORE MIGRATE deny list alert.
ORION-171012: Enhanced defragmentation routines to help prevent flows that could cause the stripe is stuck alerts.
ORION-164599: Enhanced cluster behavior upon removal of a directory with a very large number of files to the trash folder so that such a removal, as well as any subsequent removals, can be reflected in quota information without delay. Prior to this change, the quota capacity reclamation tasks could get stuck and the freed capacity could be seen in capacity pages but not in quotas.
ORION-156183: Resolved an issue that caused multiple ESTORE TOKEN_SAMPLER deny list alerts with a CNode container restart due to the found a bitmap entry pointing to write buffer in an already migrated snap time error.
ORION-156170: Resolved an issue that could, in some cases, cause the target replication cluster not to reclaim space properly upon deletion to the trash folder.
ORION-135587: Eliminated an issue that could cause a a TREE_UNLINKER deny list alert due to the assertion failed: (token->_version < (GENY_MAX_VERSION - Globals::geny_manual_recovery_version_buffer - 100)) error.

Quotas

ORION-146751: Added a list of blocked users and/or groups to the email notification sent to storage admins when a user quota is exceeded.

NFSv4.1

ORION-158487: Resolved an issue where a CNode container restarted due to an error during Nfs4::Nfs4Server::release_connection_state_resources_from_remote_silo(Nfs4::ConnectionState *) processing.
ORION-158230: Updated NFSv4.1 lock request processing to eliminate an issue that could cause a CNode container to restart due to the Buffers pool is exhausted error.

SMB

ORION-160315: Eliminated a gap in handling of trusted forest’s group SIDs during replication so that it does not cause a STATUS_INSUFF_SERVER_RESOURCES SMB error when using the Enable SMB native authentication option together with async replication.
ORION-157632: Resolved an issue that could cause an access denied error when trying to copy a newly created file or directory with a read-only attribute to a VAST SMB share.
ORION-146573: Introduced various updates to improve performance when querying for handles inside a directory.
ORION-146176: Added a detailed error message to indicate when an attempt to create a directory quota fails because the quota path already has three quotas set along it.
ORION-146159: Resolved an issue where upon deletion of a view that had SMB, NFSv3 and NFSv4.1 protocols enabled, the view could still be seen via SMB.

S3

ORION-164906: Fine-tuned S3 request processing so that having one very slow connection would not lead to occasional performance drops for other connections where VAST Cluster responds with TCP zero window size notifications.
ORION-159628: Optimized processing of CPU-intensive S3 requests to avoid scenarios where they can cause increased cluster latency.
ORION-156418: Implemented URL decoding of S3 tags passed as headers in PutObject requests.
ORION-150451: Added a capability to configure S3 replication timeouts, helping to fine-tune cluster behavior when interacting with third-party software.

VAST Database

ORION-162722: Added logic to properly updated the amount of table rows reported via VMS in case of a transaction rollback.

Replication

ORION-176181: Eliminated a flow that could cause a replication failure alert when VAST Cluster attempted to delete a snapshot but its clone was not found because it had already been deleted.
ORION-174152: Updated replication to avoid raising the Replication Stream replication missed its RPO target alarm for suspended replication streams.
ORION-168683: Made updates to eliminate a flow that could cause a false failed to set destination atime alert to be raised.
ORION-167662: Added a meaningful error message in case the replication is stopped due to a missing global snapshot clone.
ORION-166445: Introduced a number of enhancements to prevent a scenario where based on existing protection policies, local snapshots were created but were not delivered to the remote site, with many missed PRO alerts reported at the remote site.
ORION-156400: Resolved an issue where two internal replication streams were stuck in an INTERNAL_ERROR state.

Authentication & Authorization

ORION-162007: Resolved an issue where upon attaching an identity policy to a domain user, VMS did not show the policy when querying the user by username although the policy was attached and worked as expected.
ORION-160016: Enhanced the mechanism of merging user group information obtained from multiple providers to ensure that no duplicate group entries are created for a user in the VAST internal database. The duplicate entries could lead to exceeding the user group limit (1024 groups per user), causing access denied in case some of the user groups had to be dropped.
ORION-157986: Resolved an issue where an attempt to create an additional S3 key for an Active Directory user which had a historical SID, would fail with a UserDBResultCode.UNEXPECTED_ERROR error.
ORION-156632: Updated the Global Catalog (GC) lookup logic to enable VAST Cluster to discover GC servers of the top-level domain if the cluster joined a child Active Directory domain and there were no Global Catalog (GC) servers in the current site.

VMS

ORION-165412: Added a caching mechanism to avoid getting a remaining connection slots are reserved for non-replication superuser connections error when the cluster processes a very large amount of metrics requests.
ORION-164320: The PEER_IP deny list alerts can now be seen by non-root users, such as the admin user. Prior to this change, these alerts were displayed for root users only.
ORION-163880: Resolved an issue that caused the VMS state changed to DEGRADED, reason: CLUSTERED_DB_IS_STOPPING alert when trying to write a file through NFS.
ORION-159071: Resolved an issue that caused raising a false mtu is not configured correctly. mtu is 1500 alarm on all CNodes although the MTU was set correctly.
ORION-154396: Improved the NVRAM polling mechanism to prevent it from creating extra events that may impact VMS worker performance.
ORION-147841: Changed the severity of the switch change state alarm from MAJOR to CRITICAL.

VAST Web UI

ORION-161081: Added the Activate and Deactivate options to the actions menu for a protected path (Replication -> Protected Paths -> right-click a path to open the actions menu).
ORION-160971: Updated the name of the field used to specify a new column name when renaming a database column in VAST database (DataBase -> VAST DB -> drill down to columns and choose to edit a column) to read Column name instead of Schema name.
ORION-160776: When deploying a Sanmina DBox with 30TB disks, GUI messages now include a proper unit of measure for the disk capacity.
ORION-159027: When displaying analytics for a view (e.g. having selected a view from the Select Object dropdown in the Analytics page), the Define Time field now shows only options that are applicable to this particular type of analytics.
ORION-157902: Updated the logic behind the Name column in the VAST Audit Log page (DataBase -> VAST Audit Log) to always display the log file name.
ORION-155650: Updated the logic behind the Add Protected Path dialog to make the Remote Tenant and Remote Path fields non-mandatory for S3 replication.
ORION-152204: Provided a more detailed error message in case an invalid value is entered in the Atime Frequence field in view policy advanced settings (Element Store -> View Policies -> choose to create or edit a view policy -> go to Advanced tab).
ORION-151892: Removed the Power cycle option from the list of actions available for a slot in the Slots page (Infrastructure -> Slots).
ORION-151270: Renamed the following fields in the VAST Easy Install screen to replace the term External with Northbound:
- Northbound ETH MTU
- Northbound IB MTU
- Northbound IB type
ORION-147147: Updated the filter for the Link State column on the NICs page (Infrastructure -> NICs) to enable filtering by any of the column’s valid values.

VAST CLI

ORION-179969: The --target-id parameter on the replicationstream create command is now optional.
ORION-165957: Updated the logic behind the viewpolicy show --audit command to make the command work as expected.
ORION-163858: Resolved an issue that caused the --supportbundle --present callhome command to fail with the KeyError: 'upload_kwargs' error.

VAST Prometheus Exporter

ORION-169545: Updated VAST Prometheus Exporter to include information about UIDs in user-related metrics (vast_user.*).

Platform & Control

ORION-183405, ORION-180739: Introduced updates to eliminate a flow where an NVRAM failure due to an XRQ NVMF backend ctrl timeout error could result in multiple node container restarts, causing temporary service disruption.
ORION-182079: Updated the logic of rewriting the data after encryption has been enabled on the cluster to prevent CNodes from restarting with the assertion failed: ((_keys[key_id].get_key_id()) != (NO_OP_KEY_ID)) (0 != 0) key_id=1 encryption_group_id=1 isn't set yet - cannot be used as the current key error in case the cluster had some SSDs in inactive state.
ORION-180833: Resolved an issue that could cause multiple CNode containers to restart with the assertion failed: (traversal_mega_shard_id.mega_shard_value() != P::INVALID_SHARD) error.
ORION-171091: Resolved an issue that could cause repeated allocated 90% of mooktze buffers! top consumer is TABULAR_TIMEOUT_TICKER alerts for a CNode.
ORION-160765: Eliminated a flow that could cause CNode containers to restart with the assertion failed: ((drive.get_size()) > (SSD_BASE_OFFSET)) error in case of a temporary SSD issue.
ORION-154985: Eliminated a flow that could cause false BMC firmware mismatch alarms after cluster expansion.
ORION-151577: Resolved an issue where multiple CNode containers restarted after deleting a protected path with shard in release for too long and timeout expired for life_type=0,life_gen=<...> (INGEST_READ) errors.
ORION-150465: Added more logic to manage timeouts when collecting IPMI sensor logs from Sanmina DNodes to prevent encountering missing data in the logs.
ORION-148943: Resolved an issue that could cause a one or more boot-devices are missing error to be reported for a CNode without failing the CNode, with the boot drive being successfully detected within a short time after the alert.
ORION-148195: Eliminated an NFS access flow where a race condition could occur, resulting in a CNode container restart with the assertion failed: (!t->vid_uid_link.valid) vid_uid_link should not be valid error.
ORION-142529: Resolved an issue where a CNode container restarted with the timeout expired for life_type=1,life_gen=56735750 (INGEST_WRITE) with 1 active jobs error.
ORION-137866: Improved handling of traces to prevent situations where some of /vast directories on cluster’s CNodes were used up to 90% if their capacity, with most of data stored being old traces.

Call Home & Support

ORION-156360: Resolved an issue that prevented normal SMB log rotation, causing timeouts when attempting to send automatic bundles.