Best Practices Guide for VMware on VAST Data

Prev Next

Introduction

VAST clusters can scale to meet the needs of the largest, and most demanding vSphere environments. The smallest of VAST clusters can also make a very suitable storage solution for smaller environments. This doc is intended to build guidance that can be implemented on small deployments and scale up to the largest.

Before diving into the solution, it’s worthwhile to explore the problem we are trying to solve. A VAST cluster can scale capacity and performance to any desired scale. Clients access storage through front-end CNodes, each of which has Virtual IPs (VIPs) assigned to its interfaces for access. We could have an aggregate of many terabits of bandwidth available across hundreds of CNodes, but if every client only used the same single IP mapped to a single interface on a single CNode, the full performance potential of the VAST cluster wouldn’t be met.

The goal here is to maximize cluster performance in an easy-to-manage and scalable way. Whether we want to scale up ESXi hosts in a vSphere Cluster, add additional vSphere Clusters, add capacity (DNodes), or add more CNodes to the VAST cluster, the guidance below will set you on the right path for peak performance and simple scaling.

Problem Statement

As mentioned in the introduction, one problem we’re trying to avoid is that of oversaturating a limited resource (i.e. hot spotting). These can occur when a single host only uses 1 Virtual IP (VIP) to access multiple datastores. Multiple datastores can spread VM workload access across multiple interfaces, but if the end-to-end path is the same for all of them, that strategy is unlikely to work. When creating a datastore and adding it to an ESXi host, we typically connect using the fully qualified domain name (FQDN) of the VIP pool. With VAST DNS configured and a low TTL (default: 1), each time we mount a datastore, we get a different IP. This is great and exactly what we want.

In a NAS scenario where we have 100’s and 1000’s of clients accessing an NFS export, they typically resolve the FQDN and connect. And then disconnect later due to inactivity, and re-resolve again when they need to reconnect. ESXi hosts do not behave this way. Once they resolve an FQDN to an IP and mount the datastore, that datastore remains indefinitely mounted and the only time it may resolve again is on the next host reboot.

However, when we reboot an ESXi host, all datastores mount to the same FQDN and resolve at the exact same moment and we end up using a single VIP to access all of them, as shown below:

The screenshot demonstrates how data flow is visualized within VAST, isolating to display only a single host and highlight multiple Datastores that all mount using the same VIP pool with the same FQDN. Only one VIP is used in this configuration which led to the incorporation of multiple VIP pools into best practices to optimize performance and reliability.

Likewise, when we mount a datastore to all hosts at the same time, they all resolve to the same IP address and the aggregate of hosts then fight for the limited bandwidth a single interface provides. This can be mitigated by adding a datastore to different hosts by allowing the TTL to expire (default = 1 second) and a new IP will be returned for subsequent mounts on the different hosts. Fortunately, this is only an issue when we first introduce a new datastore, and after a patch cycle or any other reason for hosts in a cluster to reboot, when they come back up, it will not be at the same exact moment, and we’ll get some variety.

Solution (via DNS resolution)

To avoid the issue of resolving all datastore mounts to the same FQDN as described above, we can vary the FQDN of the server specified on each datastore. With VAST DNS, you can add a prefix in front of an FQDN. But before we dive into that, it’s important for us to understand how VAST DNS works.

Once DNS has been configured on a VAST cluster, there is a domain name suffix that is provided. On your corporate DNS server(s), you create a delegation zone for that suffix, and then all requests for hosts within that suffix will have resolution requests forwarded to the VAST DNS service to resolve, and then a single IP from the VIP pool is returned with a low TTL. Unlike round robin DNS that would return the entire list of IPs to the client, VAST is in control of distribution of those IPs and returns only one per query. When round robin DNS is attempted, clients tend to get the list of available IPs and then “randomly” pick one… which is too often the first one in the list. The result is that there is no randomization as all clients connect to the same IP even though they were provided a list of alternatives.

As an example, let’s say I have a VIP pool configured called “16 VIPs” and contains 16 Virtual IPs. When resolving the FQDN for the VIP pool, which is 16vips.vastcluster.vastdata.lab, the client is returned one of those 16 IPs. If they resolve again and the time that has passed is greater than our TTL (more than 1 second passes), a different IP is returned.

A little-known feature implemented in VAST DNS is that it resolves any prefixes attached to a known FQDN. So if we were to try to resolve test.16vips.vastcluster.vastdata.lab, our client will successfully be returned an IP. And if we try to resolve many prefixes at the exact same time, the VAST cluster is going to return a different IP for each.

Leveraging this technique, we’ll attach each datastore with a unique FQDN. While you can use any string of valid characters in front of the FQDN of the VIP pool, to keep this simple to follow, we’ll connect with: [datastorename].[VIP_pool].[domain] for each of our datastores. The result, as shown in the example below, is a great distribution of IPs, connected to multiple available CNodes in the VIP pool:

The VAST platform's Data Flow module isolates views to focus on specific hosts, facilitating detailed monitoring and analysis of network data flows through unique VIP pool FQDNs. This setup enables distinct access points, each with its own prefix in front of the VIP pool’s F QDN, ensuring efficient management across multiple datastores that share a single VIP pool infrastructure.

The screenshot above shows the VAST “Data Flow” feature found under Analytics in the VAST GUI (also referred to as VMS – VAST Management System). Many datastores have been connected all with unique FQDNs by varying prefixes on the VIP Pool FQDN, and after a host reboot, when all datastore mount requests resolve and connect simultaneously, each resolve to different IP addresses, giving us a great spread across many interfaces on the VAST cluster.

When this is repeated at a larger scale for many hosts, the result is to take advantage of the full potential of the VAST cluster, with each ESXi resolving their own randomized set of IPs from a single VIP pool. With every new ESXi host or vSphere cluster added, they can mount the existing datastores with unique FQDNs and resolve to different clusters.

When a VAST cluster is expanded, VIPs are automatically distributed to additional CNodes, and the hosts continue to connect to the IPs they have already resolved to. At some point, when the VIP pool contains too few IPs to spread across all interfaces of all the CNodes, more IPs should be added to the VIP pool, and no configuration changes or storage vMotions are required on the VMware side.

Configuration Steps

VAST Configuration

Configure VAST DNS

As mentioned above, leveraging VAST DNS is a critical component to get a good distribution of randomized IPs returned. It allows us to use unique prefixes in front of our VIP Pool FQDNs to end up with unique FQDNs. While each datastore will mount with a unique FQDN, one of the constraints in VMware is that every logical datastore gets mounted with the same server name and path to every host connecting to it.

Configuration details for VAST DNS can be found at VAST Admin Guide - DNS

Here is an example of the configured VAST DNS service, found under Network Access in VMS:

The screenshot illustrates the process of configuring DNS settings in VAST, highlighting that while the FQDN (Fully Qualified Domain Name) and the DNS Suffix appear similar, they must differ due to differences in hyphenation; here, 'tmpfx-10.vastdata.lab' is shown as an example with a hyphen in contrast to the URL's structure.

Create a VIP pool

How many IP Addresses should be in your VIP pool? To start, we recommend the number of IPs in the VIP pool be 2x the total number of available CNode interfaces. This varies by model but CNodes typically have 2 interfaces each. For example:

  • If we have 8 CNodes within our cluster, with 2 interfaces each, that’s 16 interfaces, double it to get 32 VIPs in our VIP pool.

  • Provide a name for use as part of the FQDN.

  • The FQDN typically follows this format: vip_pool.vast_cluster.domain.com

  • The examples in this paper use a VIP pool containing 16 VIPs which has an FQDN of 16vips.tmphx-10.vastdata.lab.

The image depicts the configuration settings for updating a VIP (Virtual IP) pool, including fields such as VLAN and IP ranges, with specific emphasis on avoiding routing to ensure optimal network performance by keeping IPs within the same subnet as ESXi vmkernel ports. The DNS Service FQDN is also highlighted, indicating its role in server names used by ESXi hosts to mount datastores.

From your ESXi host, test pinging the VIP pool by FQDN. Wait a few seconds, and then try again and a different IP should be returned.

The provided ping test results show successful communication between the local host and the target IP address 16vips.tmphx-10.vastdata.lab, with one instance reporting an IP address of 208 and another reporting 2.207, indicating potential network or device configuration issues affecting the final segment of the IP address.

Now take it a step further and ping test.FQDN  (ex. test.16vips.tmphx-10.vastdata.lab). It should resolve and you should get a response back from one of the VIPs in your VIP pool.

The provided command line output shows ping results for devices "test" and "test2", both located under IPv4 network 172.29.2.x, indicating successful pings with minimal latency and no packet loss.

These
 !***

This will confirm basic connectivity as well as confirm that VAST DNS has been set up correctly.

Create a dedicated View Policy

A VAST View Policy defines how Views are accessed. A dedicated View Policy for VMware is recommended and should be applied to all views that will become datastores.  View Policies should be created with the following options:

Security Flavor

NFS

Group Membership Source

Client

Virtual IP Pools

Optional, but recommended to constrain to a VIP pool only accessible to vSphere hosts and backup access nodes/proxies

Host-Based Access

  • `No Squash` and `Read/Write` for IP address or ranges of vSphere hosts & backup nodes.

  • Remove the default wildcards (‘*’) on Root Squash and Read/Write to prevent unwanted access.

Path Length Limit

Native Protocol Limit

Allowed Characters

Native Protocol Limit

Here are some screenshots of a sample View Policy that is well-suited for VMware workloads:

The "General" tab in the Update Policy interface allows administrators to specify Virtual IP Pools (VIPs) for access control, with an optional recommendation to restrict access to one or more pools. If no VIP Pool(s) is specified, all pools can still have access as long as tenancy and other rules permit it.The image illustrates configuration settings within a VAST policy under the "Host-Based Access" tab, focusing on restricting NFS access types 'No Squash' and 'Read/Write' to specific IP ranges (e10.73.0.0/20 and 172.29*). This setup ensures that only authorized ESxi hosts and other systems directly accessing datastores can receive these privileges, with an additional recommendation to align vmkernel ports' IPs with the VAST VIP pool's subnet to minimize routing issues.The screenshot displays the "Update Policy" settings with options such as Path Length Limit and Allowed Characters, both set to Native Protocol Limit within an advanced configuration section. Additionally, it includes toggle switches for Use 32-bit File IDs and Accessible_snapshot Folder In Subdirectories.The screenshot displays the "Element Store" interface in VAST, with policies listed under the "View Policies" tab. The highlighted policy shows details such as name, flavor, and creation time for easy identification and management.

Create Views for Datastores

In VAST terminology, a ‘View’ is a construct to expose storage through one or more protocols or methods. In the context of storage for VMware, the protocol we’ll use for access is NFS[1]. This represents an NFS Export.

Views are created within the Element Store in VMS. When creating our views, apply the View Policy created in the previous step. Here are some guidelines specific to views that will be created for VMware Datastores:

Recommended path layout and naming:

/vmware/datastores/VAST_CLUSTER_NAME/datastore_x.

  • This allows us to place a protection policy at /vmware/datastores/VAST_CLUSTER for use with snapshots and replication.

  • This also allows us to accept incoming replication from other VAST clusters into /vmware/datastores/REMOTE_CLUSTER without violating replication rules (such as when a replication target path lands within a replication source path, which would create a circular replication loop, which is not allowed) 

  • An additional benefit to placing datastores for a specific cluster within a common path is to simplify capacity reporting, making it easy to see the aggregate of all VM capacity consumption simply by looking at the capacity stats for the parent folder(s).

  • We suggest starting with 4-8 datastores. These will become members of the same SDRS Cluster.


[1] VAST Data Platform is currently only validated for NFSv3 with vSphere. Do not use NFSv4.1 unless otherwise directed by VAST Customer Success. VAST Data Platform is a fully supported VMware Storage partner can be found in the VMware Compatibility Guide.

Tip: Avoid using aliases on your views because they mask the actual path and may lead to confusion when managing the backend VAST cluster. When the paths in VMware (Device Backing) match the actual backend paths, there’s less ambiguity, and it reduces the chances of taking actions on the wrong paths or views that may otherwise be thought to not be in use.

The screenshot displays the Element Store interface within VAST, showcasing various datastores managed by VMware vSphere with their respective logical and physical capacities, including NFS protocols and actions available per datastore.

Figure 1 - Summary of prescribed path format. Apply a filter to Path Column to quickly find all VMware Views.

Views are created within the Element Store – Views. Only required fields will be necessary, along with “Create Directory” assuming the path doesn’t already exist. If you are using several subdirectories, such as in this example, it is not required to create each subdirectory… they will be created automatically as needed.

Path

/vmware/datastores/VAST_CLUSTER/ds1

Protocols

NFS

Policy Name

vSphere (or whatever policy name we created in the previous steps)

Create Directory

Enabled

The image displays the "Add View" configuration screen where users can set up an NFS view, specifying details such as the protocol, path, policy name, and enabling directory creation. The highlighted sections include the tenant selection, protocol type (NFS), policy name ("vSphere"), and the toggle to create a new directory.

Quotas

Quotas are used to limit the total capacity exposed for each datastore. While typically these are optional, in the case of SDRS clusters, setting a hard limit is mandatory. Why? There’s an internal VMware limit of 2 PiB or less for a datastore to be compatible with Storage DRS.

VMware workloads tend to have a high data reduction rate (DRR), which increases the logical capacity reported on a VAST cluster making even our smallest clusters report total logical capacity to be 2PiB or more. Without a quota (hard limit), views presented to VMware as datastores will report their capacity to be that of the cluster. Once data reduction rates increase and the logical capacity of a datastore reported to VMware hits 2PiB or higher, the datastores are flagged as ‘datastore is incompatible with SDRS’.

Failing to set a quota will eventually result in the datastores becoming “incompatible” with SDRS. Once flagged as such, the error prevents the creation of new VMs or vDisks in datastores that belong to a SDRS cluster. It will also prevent datastores not already in the SDRS cluster from being added to one. This can happen immediately, or down the road as DRR increases and the logical capacity reported by VAST increases.

Good news! In the event this happens because a quota was not set, VMs already in the datastore are not impacted. And to make the datastore compatible again, simply add a quota for each datastore. The datastore should become compatible again right away. If not, Refresh Capacity from within vCenter and when the datastore is recognized to be less than 2PiB total capacity, it will become compatible.

Under Element Store – Quotas, create quotas for each datastore path that will be included in your SDRS cluster(s). Set a Hard Limit of 1.99 PiB (or 2047 TiB on clusters running 5.1 and newer). This can be configured smaller if desired, especially when you want to prevent VM sprawl from consuming too much of your VAST cluster’s overall capacity.

The image shows an interface for updating storage quotas in a cloud-based file management system, where users can set both soft and hard capacity limitsations on specific datastores. The highlighted capacity field indicates that a hard limit of 1.99 PIbytes has been configured.

UI from a v5.2 cluster, where a drop-down shows up for capacity, set to 2047 TiB. TiB will show in the dropdown when VMS display settings are set to “Show Base 10 for capacity=Disabled” (aka “Base2”). vCenter will detect and report the total datastore capacity as 2PB (while technically it is 1.999 PiB) and will pass the SDRS compatibility validation. Again, a smaller quota is always acceptable as well.

The image displays the configuration interface for updating storage quotas in a VMware environment, highlighting options to set both soft and hard capacity limits thresholds for files and directories within the specified path "/vmware/datastores/tmpfpx203/datastore1". Users can enable alerts or notifications through granular unit selection such as KB, MB, GB, and TB upon quota exceedance.

Additionally, we have noted some strange reporting behavior in vCenter when datastores don’t have a quota applied. Used Capacity may report as Provisioned Capacity and skew capacity statistics. To avoid this and to see Used Capacity accurately, for any given datastore, add a hard limit in a quota.

Once you have configured a quota with a hard limit on each of your datastore paths, it should look something like this:

The screenshot displays the Element Store quotas section in VAST, showing details such as datastores named ds1 to ds8 under VMware's 'tmphx10' environment with each datastore having an 1.99 PiB hard capacity limit and marked as 'OK'. The interface allows users to manage various settings including views, lifecycle rules policies, locks, tenants, and QoS policies.

VMware Configuration

If vSphere licensing permits, leverage Storage DRS clusters. This will allow you to manage a single logical “datastore” object (SDRS cluster) destination to provision or clone VMs to without having to figure out the underlying datastore with sufficient capacity.

If your licensing doesn’t allow for SDRS clusters, all of these suggestions still apply to configuring multiple individual datastores,  but you’ll miss out on the ease of management associated with provisioning to a single destination.

Create Datastores

We recommend creating each datastore and then mounting to a single host first. Once successfully mounted, then “Mount to additional Hosts” for all hosts in the vSphere cluster(s) that will have access to the SDRS Cluster. The steps to add a datastore are as follows:

In vCenter, under Storage, choose “New Datastore…”:

The image shows the vSphere Client interface, specifically within the "TMPhx DataCenter" section, where the Storage menu is expanded to reveal options such as "New Datastore..." and other storage-related functions. This screenshot highlights how users can initiate creating or new datastores from this pane in vSphere for managing storage resources effectively.

For Type, choose “NFS” and hit Next

The image shows the "New Datastore" configuration screen in VMware vSphere, where users can select different types of datastores such as VMFS (Virtual Machine File System) or an NFS datastore on an NFS share over the network. The selected option is highlighted with a red border, indicating it's ready to proceed to the next step.

Choose NFS 3, which is fully supported and has passed certification tests to be included in the vSphere HCL. Note: At the time of this writing, NFS 4.1 is NOT fully supported and is not on the vSphere HCL.  Hit Next to proceed.

The image shows the NFS version selection page during the process of creating a new datastore in vSphere. Users can choose between NFS 3, which is compatible with ESx/ESXi hosts earlier than version 6.0, or NFS 4.1, which offers multipathing and supports Kerberos authentication but is less common due to its being its newer versions.

Here is the important part! When specifying the server, we’ll leverage the VAST DNS prefixes we discussed earlier to provide each datastore with its own unique FQDN, as seen here:

The image illustrates the process of configuring an NFS datastore in vSphere, including the selection of a specific path within a folder and the option to bind datastores to vmknic ports for load balancing across multiple NICs. Additionally, it emphasizes appending the datastore name as a prefix to the FQdn for VIP Pool creation, ensuring optimal network performance for storage resources.

Choose a single host to mount to:

The image displays the "Hosts accessibility" page in a data storage configuration interface, where administrators can select hosts that require access to the datastore. The user is highlighting the need to choose one host at a time for the initial mount operation as indicated by a red arrow and accompanying text box.

After the successful creation of the datastore, use “Mount Datastore to Additional Hosts” to add the datastore to the remaining hosts in the cluster and within other vSphere Clusters that need access:

The screenshot shows the vSphere Client interface with options to manage virtual machines and datastores, highlighting the "Mount Datastore to Additional Hosts..." action in the ACTIONS menu for the selected datastore tmphx10-ds1.

Repeat this procedure until all datastores have been added.

The vSphere Client displays detailed information about datastores, including their status, type (NFS 3), capacity, and device paths, facilitating management of resources in an environment such as 'tmpphx10.' Unique prefixes in logical datastore names help distinguish between them within the VIP Pool labeled with FQDN, ensuring clarity and organization in complex storage configurations.

When complete, you should have a list of datastores that looks like this. Take note that the “Device” for each is a unique FQDN and a unique path, corresponding to each of the Views we created in an earlier step.

: A previous “VMware Best Practices Guide” published by VAST had a slightly different recommendation involving creating multiple VIP pools with a subset of CNodes in each. This new method leverages DNS prefixes, and a single VIP pool supersedes that recommendation. If your cluster was configured that way, we recommend you create new datastores with this method, add them to your SDRS cluster and then move the existing datastores out of the SDRS cluster. Then migrate all the VMs into the SDRS cluster using storage vMotion, allowing vCenter to make first-placement recommendations.

Create SDRS Cluster

With our datastores created, we now want to create a Storage DRS Cluster to add them to. The steps are as follows:

Choose “New Datastore Cluster”:

The screenshot illustrates the vSphere Client interface, where users can manage datastores and datastore clusters within a datacenter environment. The current focus is on creating a new storage cluster by navigating through the "tmpPhx-vsphere.vastdata..." hierarchy to the 'Datastore Clusters' tab.

This setup allows administrators to organize their storage resources efficiently, leveraging options like adding hosts, creating folders or distributed switches, and deploying virtual infrastructure components seamlessly across the VMware platform.

Give your cluster a name. This usually includes the name of your VAST cluster. And for Datastore type, choose NFS 3, Turn On Storage DRS, and click Next:

The image shows the "Name and Location" step in creating a new Datastore Cluster, where users can select options such as datastore name, location, and datastore type from dropdown menus. The selected datastore type is highlighted as NFS 3, indicating it's chosen among other available types like VMFS or NFS 4.1.

When asked about automation, choose “No Automation (Manual Mode)” and click Next:

The screenshot depicts the "Storage DRS Automation" configuration page within VMware vSphere, where users can choose automation levels such as No Automation (Manual Mode) or Fully Automated to manage virtual machine storage migrations and resource optimization dynamically.

Leave runtime settings as the default:

The screenshot illustrates the Storage DRS Runtime Settings page, where users configure I/O latency thresholds and space utilization levels to optimize storage resource allocation and ensure efficient data migration within their VMware environment.

Select which clusters and hosts can access your SDRS cluster:

The image shows the "Select Clusters and Hosts" screen in a datacenter cluster creation process, where users can choose which clusters to include in their new datastore cluster setup. The interface highlights details such as available CPU, memory, storage capacity, and vSphere settings for potential selections.

Select the datastores that were just created in the previous step:

The user shows the "Select Datastores" step in the process to create a new datastore cluster, where multiple NFS 3 datastores with varying capacities and host connection states are available for selection. The selected datastores must be reviewed before proceeding to finalize the configuration.

Review the summary and click ‘Finish’:

The image displays the final setup screen for creating a new Datastore Cluster, where users can review and confirm settings including Name and Location, Storage DRS Automation levels, and Storage DSR Runtime Settings before completing the creation process.

With automation disabled, there will still be “first placement” selection, based on available capacity in the underlying datastores. Simply deploy VMs into your SDRS cluster and let vCenter choose the datastores for them. From within Analytics – Capacity, you should see something similar to the screenshot below once a good number of VMs have been added:

The image displays a VAST interface showing the capacity overview of the '/vmware/datastores/tmphx10' datastore, with a total logical capacity index of 38.375 TiB and an emphasis on data reduction metrics and usage details.

And as for traffic flows, we should see something like this. Notice the nice distribution of load across CNodes:

The image depicts a VAST Data Flow dashboard, showcasing real-time data flow visualization with an option to filter by user (User), host, VIP pool, and node in the network environment. The interface allows for detailed exploration through search functionalities for User, Host, VIP Pool, Node, and View options.

Perspective from a single host:

The screenshot depicts the VAST (Virtual Application and Storage Testbed) dashboard showing data flow visualization with nodes, connections, and IP addresses represented in various colors indicating different protocols or bandwidth usage.

And now you’ve got a scalable, highly performant VMware deployment on VAST!

Additional Considerations

Higher Performance: nconnect

If running vSphere 8.0 Update1 or newer, nconnect is a newly supported option that allows more connections from a single host to a single datastore. vSphere 8.0 Update 1 or newer supports nconnect values up to 4, although this maximum can be increased up to 8 by updating MaxConnectionsPerDatastore.

The default nconnect value remains at connections=1 unless updated. If left at the default value of 1 connection, the max throughput you can expect from one host to one datastore  (regardless of whether you have a single vDisk, or an aggregate of vDisks) is approximately 2GB/s. The datastore can have more than 2GB/s from an aggregate of hosts, where each host can hit that max cumulatively, as long as overall bandwidth allows for it, the datastore will see more throughput with each additional host.

To increase the throughput for a single host, increase the connections for that datastore.

Too see a list of datastores and their nconnect setting (Connections column) run the following command.

[root@ESXHOST:~] esxcli storage nfs list

Volume Name  Host                             Share       Vmknic  Accessible  Mounted  Connections  Read-Only   isPE  Hardware Acceleration
-----------  -------------------------------  ----------  ------  ----------  -------  -----------  ---------  -----  ---------------------
tmphx-10-4   vsphere-4.tmphx-10.vastdata.lab  /vsphere-4  None          true     true            1      false  false  Not Supported
tmphx-10-3   vsphere-3.tmphx-10.vastdata.lab  /vsphere-3  None          true     true            1      false  false  Not Supported
tmphx-10-2   vsphere-2.tmphx-10.vastdata.lab  /vsphere-2  None          true     true            1      false  false  Not Supported
repo         172.29.2.1                       /repo       None          true     true            1      false  false  Not Supported
tmphx-10-1   vsphere-1.tmphx-10.vastdata.lab  /vsphere-1  None          true     true            1      false  false  Not Supported

The default max for nconnect is 4 connections. This max can be increased to 8 connections with the following setting:

[root@tmphx-cb1-cn1:~]  esxcli system settings advanced list -o '/NFS/MaxConnectionsPerDatastore'
   Path: /NFS/MaxConnectionsPerDatastore
   Type: integer
   Int Value: 4
   Default Int Value: 4
   Min Value: 4
   Max Value: 8
   String Value:
   Default String Value:
   Valid Characters:
   Description: Maximum number of RPC connections allowed per NFS datastore
   Host Specific: false
   Impact: none
[root@tmphx-cb1-cn1:~]  esxcli system settings advanced set -o '/NFS/MaxConnectionsPerDatastore' -i 8

Connection settings are configured per datastore and are configured host by host. Not every host needs to match, but we recommend keeping the connection configuration uniform across all hosts in the same cluster. One exception to this may be to use different nconnect settings for a host with a 25Gb NIC than another that has a 100 Gb NIC.

Here are some example commands to be run on every host for each datastore on that host:

Set all datastores in the StorageDRS cluster to nconnect=4:

esxcli storage nfs param set -v sdrs_cluster_datastore-1 -c 4

esxcli storage nfs param set -v sdrs_cluster_datastore-2 -c 4

esxcli storage nfs param set -v sdrs_cluster_datastore-3 -c 4

esxcli storage nfs param set -v sdrs_cluster_datastore-4 -c 4

… etc

Repeat on EVERY host. 

esxcli storage nfs param set -v DATASTORE1 -c 2 (sets another standalone datastore to nconnect=2)
esxcli storage nfs param set -v DATASTORE2 -c 8 (sets another standalone datastore to nconnect=8)

Alternatively, nconnect can also be set on a host at time of mounting the datastore to a host:

esxcli storage nfs add -H vsphere-1.tmphx-10.vastdata.lab -s /vmware/datastores/nconnect8 -v tmphx10-nconnect8 -c 8

Connections can be increased non-disruptively but require evacuation and unmount in order to lower the connection account. As such, it may make more sense to increase connections to 2 at first, then step it up as needed.

How high should the connection count be set? This will depend on the max bandwidth of NICs in the ESX hosts, as well as how the VAST cluster networking has been configured and end-to-end networking bottlenecks (such as interswitch links and MLAGs). If “split-networking” was used on the CNodes, then interfaces max out at 50Gb/s. If dual NIC CNodes are used, the max is usually 100 Gb/s per interface.

Here's a general guide for common connections:

10 Gbps

nconnect=2

NOTE: The maximum bandwidth of a 10Gbps link is approximately 1,200 MB/s, which is already lower than the ~2 GB/s upper bound limitation  (approx 17.5 Gbps), and while throughput may not benefit from the additional connections, there may be some benefits in the number of IOPS.

25 Gbps

nconnect=2

40/50 Gbps

nconnect=4

100 Gbps

nconnect=8

While these are guidelines intended to squeeze the most out of a single host:datastore mount, keep in mind that you could run the risk of starving IO for other VMs, whether on the same host, or other hosts connected to the same datastore (which resolved to the same underlying IP). The same risk applies anytime a bottleneck in the end-to-end network is hit, most commonly occurring within a single link within an aggregate of links (LACP), usually between switches.

A visual representation showing the positive impact the connection count adjustment can make:

The image depicts a performance analysis graph from VAST, showing cluster bandwidth metrics with an aggregation type set to 'Max'. It highlights write and read bandwidths over a 2-hour period, demonstrating network saturation at around 4500 MB/sec when using nconnect=4, indicating that a 50Gb link is being fully utilized under specific configuration settings.

VMKNIC Binding

Introduced in vSphere 8 Update 1 is a feature called ‘vmknic binding’. For more network bandwidth and an even better distribution of end-to-end network paths, consider using vmknic binding to assign datastores to various vmkernel ports in your ESXi hosts.

Why? If your host has multiple vmkernel ports on the storage network, there is no load balancing and teaming policies to allow you to use all the aggregate bandwidth of multiple NICs. The vmkernel that will be selected will be the one on the same subnet as the VIP pool addresses. If multiple vmkernel ports are on the same subnet, only one will be chosen. As for which physical NIC the vmkernel will use, the balancing policies are all based on SOURCE characteristics (originating virtual port ID or source MAC).

Only ONE vmkernel will be selected, and all datastore traffic associated with it will traverse through one of the available underlying Active NICs.

Additional VAST Cluster Settings

There is a utility in VAST (vtool) that can be used to adjust custom tunables, referred to as vsettings. Setting vtool vsettings can improve performance of single-stream NFS reads, however which vsettings to apply are VAST version specific. The biggest impact of these settings is observed when reading the zeroes of any thin-provisioned vdisk in an NFS datastore (even though the zeroes were never actually logically written) – without these settings, performance will peak around 300-350 MB/s (assuming nconnect=1… expect higher with higher nconnect values). But with these settings applied, expect about 4X the performance. The easiest test is a storage vMotion of a VM with a large-ish vDisk of 100+GB from one VAST datastore to another. The vDisk can be blank and represent a size on disk of 0MB.

Connect to any CNode via ssh and apply the following: (paste them in one at a time to ensure each comes back with “successful”)

For VAST clusters that run versions 4.7-sp10 and up to 5.0.[latest], use:

vtool vsettings set min_len_hybrid_task_socket_read=262144 
vtool vsettings set min_len_hybrid_task_socket_write=262144

VAST versions 5.1 and higher use:

vtool vsettings set min_len_hybrid_task_socket_read=262144
vtool vsettings set min_len_hybrid_task_socket_write=262144
vtool vsettings set SOCKET_SKIP_HYBRID_TASK_SEND=true
vtool vsettings set SOCKET_SKIP_HYBRID_TASK_RECV=true
vtool vsettings set SOCKET_ZERO_COPY_MIN_BUFSIZE_VALUE=10240000
vtool vsettings set SOCKET_ZERO_COPY_MIN_BUFSIZE_VALUE_NEXT_GEN=1024000

To confirm whether or not these settings have been applied, run vtool vsettings show and confirm all of them are listed in the output. After setting these, the impact should be immediate and no further action is required. These only need to be run once from a single CNode and are cluster-wide settings, automatically applied to all CNodes.