Complete these steps to enable ab NVMe/TCP Linux client to access VAST block storage:
Install the NVMe CLI tool.
Create transport rules to enable the client to connect to the VAST Cluster block controller.
Verify that the host's maximum number of retries for NVMe allows for maintaining high availability.
Obtain the host NQN that will identify your host for the VAST cluster.
Connect to the NVMe subsystem on the VAST cluster using your host NQN.
This step requires that block volumes have been created on the VAST cluster and mapped to your hosts. it includes the substeps:
Load kernel modules to enable NVMe over Fabrics.
Discover available VAST NVMe subsystems over TCP.
Connect to the VAST NVMe subsystem you need.
Verify the configuration by listing connected NVMe subsystems and block volumes.
If necessary, troubleshoot your configuration.
Installing the Client
To configure client block hosts for interacting with the cluster as a remote NVMe device, install the NVMe CLI tool on the host:
sudo yum install nvme-cli
Creating Transport Rules
Create transport rules to ensure that the client automatically discovers and connects or reconnects to the cluster's subsystems and volumes after reboot or new volume mappings:
Create the following file:
sudo vi /lib/udev/rules.d/71-nvmf-vastdata.rules
Add this content to the file:
# Enable round-robin for Vast Data Block Controller ACTION=="add|change", SUBSYSTEM=="nvme-subsystem", ATTR{subsystype}=="nvm", ATTR{model}=="VASTData", RUN+="/bin/sh -c 'echo round-robin > /sys/class/nvme-subsystem/%k/iopolicy'" ACTION=="add|change", SUBSYSTEM=="nvme-subsystem", ATTR{subsystype}=="nvm", ATTR{model}=="VastData", RUN+="/bin/sh -c 'echo round-robin > /sys/class/nvme-subsystem/%k/iopolicy'"Run:
sudo udevadm control --reload-rules sudo udevadm trigger
Verifying the Maximum Number of Retries for NVMe
To provide for high availability, the maximum number of retries configured on the host for NVMe commands must be set to a non-zero value. On most systems, the default parameter value is 5.
To check the maximum number of retries for NVMe, run one of the following:
cat /sys/module/nvme_core/parameters/max_retries
grep . /sys/module/nvme_core/parameters/*
To persistently set a maximum number of retries for NVMe to 5:
For hosts where
/sys/module/nvme_corealready exists:Create or edit this file:
sudo nano /etc/modprobe.d/nvme_core.conf
Add this line to the file:
options nvme_core max_retries=5
Run this command to have the new settings applied on boot:
On Ubuntu or Debian:
sudo update-initramfs -u
On RHEL, CentOS or Fedora:
sudo dracut -f
Reboot the host:
sudo reboot
Verify that the maximum number of retries for NVMe is now set to 5:
cat /sys/module/nvme_core/parameters/max_retries
For hosts where
/sys/module/nvme_coredoes not exist:Edit GRUB:
sudo nano /etc/default/grub
Add the
nvme_core.max_retries=5string to the options inGRUB_CMDLINE_LINUX_DEFAULT, for example:GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.max_retries=5"
Apply the updates:
On Ubuntu or Debian:
sudo update-grub
On RHEL, CentOS or Fedora:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
Reboot the host:
sudo reboot
Verify that the maximum number of retries for NVMe is now set to 5:
cat /sys/module/nvme_core/parameters/max_retries
Obtaining the Host NQN
The host NQN is generated automatically when you install nvme-cli. The host NQN must be specified in host properties maintained on the VAST cluster to allow for mapping VAST block volumes to the host.
To obtain your host NQN:
cat /etc/nvme/hostnqn
Connecting to Mapped Volumes
Note
This step requires that block volumes have been created on the VAST cluster and mapped to your host.
After volumes have been mapped to the host through VMS, you can do the following to connect to the mapped volumes:
Load the necessary kernel modules to enable NVMe over Fabrics (NVMe-oF).
To load the modules once:
sudo modprobe nvme sudo modprobe nvme-fabrics
To have the modules load automatically on reboot:
Create a file
/etc/modules-load.d/nvme.confand list the modules to be loaded in it:nvme nvme-fabrics
Run this command so that the new settings apply on boot:
On Ubuntu or Debian:
sudo update-initramfs -u
On RHEL, CentOS or Fedora:
sudo dracut -f
Discover available VAST NVMe subsystems over TCP:
sudo nvme discover -t tcp -a VIRTUAL_IP -s 8009
For
VIRTUAL_IP, provide a virtual IP from a virtual IP pool with Protocol role that is accessible to the relevant block-enabled view. The view policy might restrict/dedicate virtual IP pools.Note
For information about creating subsystems on the cluster, see Provisioning Block Storage with VMS and Creating a Block Storage Subsystem (View).
Add the discovery parameters you used to
/etc/nvme/discovery.confso that the configuration sustains a reboot:echo "-t tcp -a VIRTUAL_IP -s 8009" >> /etc/nvme/discovery.conf
Replace
VIRTUAL_IPwith the actual virtual IP.Connect to a VAST NVMe subsystem:
Obtain the subsystem NQN from VMS:
In the VAST Web UI, open the Views tab in the Element Store page.
Find the subsystem view.
Right-click the view and select View to see its configuration.
The NQN is displayed in the Subsystem NQN field.
Click the
button to copy the NQN to your clipboard.
Establish connection to the subsystem:
sudo nvme connect -t tcp -n SUBSYSTEM_NQN -a VIRTUAL_IP
in which:
SUBSYSTEM_NQNis the subsystem NQN obtained in the previous step.VIRTUAL_IPis a virtual IP configured on the cluster and accessible to the subsystem view.
Run the
connect-allcommand to connect all paths:sudo nvme connect-all -t tcp -a VIRTUAL_IP -s 8009
If volumes are added to the subsystem or removed from the subsystem, you can run
connect-allagain to update the volume mapping.sudo nvme connect-all
Run the following command to ensure that your NVMe connection sustains a reboot:
systemctl enable nvmf-autoconnect.service
Listing Subsystems, Paths and Available Volumes
To display a list of connected NVMe subsystems and paths, use the
nvme list-subsyscommand.For example:
sudo nvme list-subsys nvme-subsys0 - NQN=nqn.2024-08.com.vastdata:ef992044-0c8e-557a-a629-4d3c9abd9f9d:default:subsystem-3 hostnqn=nqn.2014-08.org.nvmexpress:uuid:27590942-6282-d720-eae9-fdd2d81355d4 iopolicy=round-robin \ +- nvme9 tcp traddr=172.27.133.9,trsvcid=4420 live +- nvme8 tcp traddr=172.27.133.8,trsvcid=4420 live +- nvme7 tcp traddr=172.27.133.7,trsvcid=4420 live +- nvme6 tcp traddr=172.27.133.6,trsvcid=4420 live +- nvme5 tcp traddr=172.27.133.5,trsvcid=4420 live +- nvme4 tcp traddr=172.27.133.4,trsvcid=4420 live +- nvme3 tcp traddr=172.27.133.3,trsvcid=4420 live +- nvme2 tcp traddr=172.27.133.2,trsvcid=4420 live +- nvme16 tcp traddr=172.27.133.16,trsvcid=4420 live +- nvme15 tcp traddr=172.27.133.15,trsvcid=4420 live +- nvme14 tcp traddr=172.27.133.14,trsvcid=4420 live +- nvme13 tcp traddr=172.27.133.13,trsvcid=4420 live +- nvme12 tcp traddr=172.27.133.12,trsvcid=4420 live +- nvme11 tcp traddr=172.27.133.11,trsvcid=4420 live +- nvme10 tcp traddr=172.27.133.10,trsvcid=4420 live +- nvme0 tcp traddr=172.27.133.1,trsvcid=4420 live
To display a list of connected NVMe volumes, use the
nvme listcommand:sudo nvme list
Disconnecting Existing Connections
The following commands disconnect the cluster's subsystems from the host:
To disconnect all connected subsystems:
sudo nvme disconnect-all
To disconnect a specific subsystem:
sudo nvme disconnect -n <NQN>
Troubleshooting
Issue: NVMe Subsystem Not Found
Cause: Incorrect IP address or network issue.
Solution: Verify that the virtual IP is correct and that the host has network connectivity to the VAST cluster.
Issue: NVMe Device Not Appearing
Cause: NVMe connection not established or missing kernel modules.
Solution: Ensure kernel modules are loaded using:
sudo modprobe nvme sudo modprobe nvme-fabrics
Logs and Diagnostics
Use dmesg to check kernel logs for errors related to NVMe:
dmesg | grep nvme
Issue: No Mapped Volumes
Symptoms:
sudo nvme discoveroutput:Failed to write to /dev/nvme-fabrics: Connection refused Failed to add controller, error connection refused
dmesgerror:nvme nvme0: failed to connect socket: -111
Root cause: There are no mapped volumes.
Fix: Validate the volume mapping to this client's NQN.
Issues Connecting to the Target
Symptom | Root Cause |
|---|---|
Failed to add controller, error cannot assign requested address | The source IP in the command line is incorrect. |
Failed to write to /dev/nvme-fabrics: Input/output error could not add new controller: failed to write to nvme-fabrics device | The IP used for connection is incorrect or does not belong to the relevant tenant. |
failed to get discovery log: Success | The IP used for connection is incorrect or does not belong to the relevant tenant. |