Resolved Issues in 5.4.0

Prev Next

Install & Upgrade

  • ORION-259291: Resolved an issue due to which in some cases, if the Inband option was selected in the Management network field of the VAST Cluster Install wizard, the CNodes could be configured as inband while the DNodes were configured as outband.

  • ORION-232777: Fine-tuned inter-node SSH connection timeout settings to prevent an OS upgrade from failing due to the ConnectionLost('Login timeout expired') error.

  • ORION-238278: Made updates to prevent a flow where cluster deployment could fail with the FW parameters have changed. please use ipmitool power cycle to cycle the server and rerun error when running the cluster networking configuration script (configure_network.py) on the cluster nodes.

  • ORION-229534: Introduced optimizations to shorten the time required to add a large number of nodes to the cluster.

  • ORION-225902: Resolved an issue where a DNode was deemed failed during an upgrade because of timeouts that occurred when running internal NVMe CLI commands to update the list of hosts.

Cluster Expansion

  • ORION-262506: Resolved an issue that could occur during DBox expansion and cause device activation to take much longer than expected, resulting in expansion failures.

  • ORION-253664: Resolved an issue where an attempt to add a MLK DBox hang in the RUNNING state due to an InterfaceError('connection already closed') error.

Networking

  • ORION-269072: Enhanced validations of ASN entries made when configuring a BGP connection (Network Access -> BGP Configurations -> Create BGP) to ensure that each value does not exceed 32 bits.

  • ORION-258779: Updated the CNode removal task to skip validation of the OpenSM service on clusters with InfiniBand internal networking if forced removal was requested, thus allowing to forcefully remove a failed CNode with no internal network connectivity.

  • ORION-252485: Deprecated the Subnet bits and VIP Grace Period fields in BGP configuration settings (Network Access -> BGP Configurations -> choose to create or edit a BGP configuration).

  • ORION-241079: Made updates to prevent "port_rcv_switch_relay_errors" increased during the run alerts from appearing in the VMS log.

  • ORION-204316: Enhanced the cluster networking configuration script (configure_network.py) to ensure that it properly configures CNodes on HPE IceLake clusters when CNode Port Affinity is used.

  • ORION-185008: Resolved an issue where modifying a subnet in an existing virtual IP pool could result in the pool being not accessible by clients.

Element Store

  • ORION-268120: Resolved an issue where an unexpected client disconnection during the TCP connection establishment phase could cause the CNode container to restart with the assertion failed: ((*__errno_location ()) == 11 || (*__errno_location ()) == 11) errno: Software caused connection abort error.

  • ORION-262912: Made updates to prevent drive latency spikes with subsequent denylisting when processing a specific type of workload.

  • ORION-259006: Resolved an issue that could cause CNode containers to restart with the Buffers pool is exhausted error when running replication over an encrypted connection.

  • ORION-256617, ORION-221522: Optimized processing of metadata in replication flows to resolve an issue which could cause a CNode container to restart with a timeout expired <...> (TRAVIS) or timeout expired <...> (INGEST_WRITE) errors.

  • ORION-254716: Resolved an issue that could cause multiple CNode containers to restart with the Address not mapped to object error.

  • ORION-250931: Updated VAST Catalog-related optimizations to resolved an issue that could cause multiple CNode containers to restart due to timeout expired <...> (TRAVIS) and spinlock lock takes too long errors.

  • ORION-248586: Resolved an issue that could cause occasional failed to insert reference cache key to the references cache generation tree alerts on the cluster.

  • ORION-245764: Eliminated a flow that could cause an allocated 90% of mooktze buffers! top consumer is DIRSNAP_RESOLVED_SNAPSHOTS alert on the cluster.

  • ORION-245108: Improved handling of existing open handles for deleted files to prevent timeout expired <...> (INGEST_WRITE) errors when processing NFSv4 write workloads.

  • ORION-244972: Resolved an issue that could cause a Too many attributes changes protected by snapshots on handle. IO will fail. Need to delete snapshots to continue. alert followed by an ESTORE DELETE_SNAP denylist imposed due to the resolver->has_map_for_element(chandle) error.

  • ORION-244281: Resolved an issue that could cause multiple CNode containers to restart with the "timeout expired <...> (TRAVIS) error.

  • ORION-241655: Resolved an issue where after enabling VAST Catalog on the cluster, the ESTORE BIG_CATALOG denylist was imposed following halted write alerts.

  • ORION-236508: Eliminated a flow that could cause an allocated 90% of mooktze buffers! top consumer is ESTORE_MIGRATOR_WRITE_HANDLES_TREE alert on the cluster.

  • ORION-221633: Eliminated a race condition that could cause a (num_entries_removed == 1) (2 == 1) didn't find name entry to remove when applying content defrag future error followed by the ESTORE CONTEN_DEFRAG denylist on the cluster.

  • ORION-214091: Resolved an issue that could cause the (!has_collision) Found two old extents corresponding to same future that overlap each other error followed by ESTORE CONTENT_DEFRAG denylist on the cluster.

  • ORION-198889: Updated defragmentation routines to resolve an issue where multiple CNode containers restarted with the shard in release for too long error after adding a highly intensive workload.

Multi-tenancy

  • ORION-252148: Resolved an issue where an attempt to mount an NFSv4.1 view could not succeed on one of the virtual IPs in a virtual IP pool while all other virtual IPs from the same pool worked as expected.

Quality of Service (QoS)

  • ORION-231253: Enhanced workload prioritizing to prevent scenarios where the actual performance could be 10% off the cluster-wide write bandwidth limit specified.

SMB

  • ORION-249181: Made updates to reduce the time needed to complete NTLM authentication when the Use SMB native authentication option is enabled for the cluster.

S3

  • ORION-240463: Optimized a flow that could occur when listing objects in a versioned bucket.

  • ORION-188749: Updated the logic behind the view policy option that restricts S3 read/write access based on client IP addresses (in VAST Web UI: Element Store -> View Policies -> choose to create or edit a view policy -> Host-Based Access -> S3 pane -> Read/Write) so that the option supports bucket-level operations, such as creating or deleting a bucket.

Data Protection

  • ORION-267226: Fine-tuned Global Namespace timeouts to resolve an issue where creation of a global snapshot clone for a path with a very large number of subdirectories took longer than expected.

  • ORION-238060: Eliminated a flow that could cause the cluster to reach the HALT_ALL_INCLUDING_UNLINKS metadata state when cloning a snapshot of a directory with a very large number of files.

Replication

  • ORION-267046: Resolved an issue where an attempt to delete a protected path with a replication stream in a CREATE_FAILED state resulted in a DELETE_FAILURE_OBJ_IS_BEING_USED error.

  • ORION-256402: Updated handling of sync points in some replication flows to resolve an issue that could cause replication streams to miss their RPOs due to a TOO_MANY_CLONES error.

  • ORION-233749: Made updates to ensure automatic deletion of access keys associated with a replication destination tenant which has been deleted.

VAST on Cloud

  • ORION-230479: Updated the logic behind the Periodic readiness check failed for OS upgrade alarm to avoid raising the alarm on VoC on GCP clusters.

Authentication & Authorization

  • ORION-262465: Resolved an issue that could cause a ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation: The AWS access key Id you provided does not exist in our records - due to mismatch error when trying to access a bucket using S3 keys of a local user that had the same name, UID, group membership and identity policy assignment as a user present in the Active Directory.

  • ORION-223833: Enhanced handling of the scenario where the cluster is unable to join an Active Directory domain so that the provider does not get enabled as a result of the discovery process, which could otherwise cause protocol traffic outage.

VMS

  • ORION-238083: Updated the logic behind the VMS option to power off an EBox to prevent situations where the power off task is reported as complete while the box is inactive but not powered off.

  • ORION-225432: Implemented a filter to avoid offering metrics that are not applicable for the platform when selecting a predefined analytic report in the Analytics -> Predefined Analytics page of VAST Web UI.

VAST Web UI

  • ORION-256862: Updated the logic behind the the Password restore delay field in indestructibility settings (Settings -> Indestructibility -> Restore Password) to avoid raising a Please provide datetime in positive [number][unit (y/w/d/h/m/s)] format error on an attempt to modify the field value.

  • ORION-255304: Updated the logic behind the Type column in the Infrastructure -> Switches page to allow for filtering by any of the switch vendors displayed in the grid.

  • ORION-246972: Updated the logic behind the Add Volumes manually option in the Map Volumes dialog (Data Protection -> Snapshots -> right-click a snapshot and choose Map Snapshot Volumes) so that it starts the volume mapping process as expected.

  • ORION-207301: Updated the logic behind the fields that show the unit of measure used for the Minimum object size and Maximum object size filters set fo an existing lifecycle rule (Element Store -> Lifecycle Rules -> choose to view or edit a rule) to resolve an issue that caused these fields to display KB while the original values supplied and used were in bytes.

  • ORION-203737: Updated the logic behind the filter in the Box column in the Infrastructure -> CNodes and DNodes pages so that the nodes can be filtered as appropriate.

  • ORION-203189: Updated the logic behind the External Netmask field in cluster networking settings (Settings -> Configure Network) to accept IPv6 netmasks.

VAST CLI

  • ORION-246115: Made updates to ensure that VAST CLI keyword auto-completion works as expected for hyphenated keywords.

VAST REST API

  • ORION-178408: Updated the estimated_read_only_time field returned by the /protectedpaths/ and /protectedpaths/<path ID>/ endpoints of VAST REST API to return a float number instead of a string.

Platform & Control

  • ORION-272260: Adjusted the way VMS handles CPU ambient temperature thresholds to avoid raising false alarms on cluster nodes.

  • ORION-269167: Made updates to avoid an internal SPDK-related flow that could occur following a SSD failure and cause a CNode container restart on Ceres v2 clusters.

  • ORION-259669: Resolved an issue where following an upgrade, the number of available stripes started to decrease, with a stripe is stuck alert was raised.

  • ORION-257729: Resolved an issue that could cause multiple CNode containers to restart due to the Retry time exceeded - waiting for a free fiber error on a cluster that had VAST Catalog enabled.

  • ORION-250069: Made updates to ensure that replacement of a very large number of SSDs at the same time does not result in CNode containers being restarted due to the after deactivation of drive=<...> dbox=<...> will be at risk to have insufficient drives for mioc state and (load_successful) this might mean that system_format failed earlier <...> or that loading mioc failed due to unavailable dboxes/drives errors.

  • ORION-244285: Made updates to avoid occasionally raising the request posted buffers in dev_id=0 for module E are 0 alert following a port shutdown.

  • ORION-242111: Provided an additional RDMA keepalive mechanism to prevent failures that could occur when recovering from a DBox HA event due to incorrect CNodes being shut down during the recovery.

  • ORION-240717: Resolved an issue where a Failed to decompress LZ4 block: incorrect block header occurred while trying to restore Veeam backups stored on the VAST cluster.

  • ORION-236867: Provided graceful error handling to avoid a CNode container restart in case the gateway IP could not be found.

  • ORION-236567: Made updates to prevent CNode container restart with the Invalid permissions for mapped object - (stack_size=8184 bytes_remaining=0 pct_remaining=0%) HINT: is your stack big enough? error when using S3 with TSL.

  • ORION-236292: Resolved an issue that could cause multiple CNode containers to restart with the timeout expired <...> (SARRAY_READ) error.

  • ORION-236751: Resolved an issue that could cause a CNode container to restart with the ((_memory_client_counter) > (0)) (0 > 0) error when processing NFSv4.1 workload.

  • ORION-236431: Improved formatting of error messages related to cluster networking to prevent periodic printing of the NVMe CLI ValueError: not enough values to unpack message in /var/log/messages.

  • ORION-232944: Updated the logic behind the Power off option in the CNode page of VAST Web UI (Infrastructure -> CNodes -> right-click a node) so that it does not require to be clicked twice to power off the node.

  • ORION-231272: Resolved an issue that could cause increased write latency for specific workloads on a Cisco EBox.

  • ORION-223130: Resolved an issue where a CNode container restarted with the timeout expired <...> (WRITE_BUFFER_READ) error.

  • ORION-197576: Resolved an issue that could cause false Session Audit Bad User PWD alerts in ipmitool sel elist logs.

  • ORION-186041: Resolved an issue that could cause a CNode container to restart with the Migrate resulted in two overlapping extents at the same snap + HD!? error.

Support & Call Home

  • ORION-238168: Updated the functionality that lets you delete support bundles so that you can delete a bundle that is still in the process of being created.