Overview
This document describes a host-side, supported approach to ensure NVMe over TCP (NVMe/TCP) devices are reliably discovered and usable after reboot. It includes copy/paste snippets, recommended systemd ordering, and an explanation of key flags.
Scope
Linux hosts that connect to remote NVMe namespaces using NVMe-oF over TCP (nvme connect).
Goal: After reboot, the host consistently loads NVMe modules, connects to subsystems, and mounts filesystems.
Storage-side support required?
No. Everything in this document is host-side (kernel modules, initramfs, systemd ordering, and mounts). Storage is only expected to present NVMe subsystems as usual.
Boot chain
To guarantee NVMe/TCP is ready after boot, implement two layers for module loading plus a deterministic connect sequence:
Initramfs includes NVMe modules (the earliest practical point).
systemd-modules-load loads them at boot (backup layer).
A systemd connect service runs after the network is online.
Filesystems are mounted via fstab using safe options.
Correct order
Modules loaded
Network online
NVMe Connect Service
mounts (fstab automount or normal mount)
A. Identify required kernel modules
For NVMe/TCP, you typically need these modules:
nvme
nvme-core (sometimes appears as nvme_core in tooling output)
nvme-tcp
Check what is currently loaded
lsmod | grep -E '(^nvme|nvme_tcp|nvme_core|nvme-core)'Confirm the module exists on the system
modinfo nvme-tcpIf modinfo nvme-tcp returns information, the module is available.
B. Load NVMe modules early using initramfs (best practice)
Initramfs is loaded before most of the OS and services. Including NVMe modules here is the most reliable way to ensure the kernel can bring up NVMe/TCP early.
Rocky / RHEL / CentOS (dracut)
Create a dracut config snippet:
File: /etc/dracut.conf.d/nvme-tcp.conf
sudo mkdir -p /etc/dracut.conf.d sudo tee /etc/dracut.conf.d/nvme-tcp.conf >/dev/null <<'EOF' add_drivers+=" nvme nvme-core nvme-tcp " EOF
Rebuild initramfs:
sudo dracut -f
Verify modules are in initramfs:
lsinitrd /boot/initramfs-$(uname -r).img | grep -Ei 'nvme(_|-)?(tcp|core|fabrics)|nvme'
Ubuntu / Debian (initramfs-tools)
Add modules to initramfs list:
File: /etc/initramfs-tools/modules
sudo tee -a /etc/initramfs-tools/modules >/dev/null <<'EOF' nvme nvme-core nvme-tcp EOF
Update initramfs:
sudo update-initramfs -uVerify contents:
lsinitramfs /boot/initrd.img-$(uname -r) | grep -E 'nvme|nvme-tcp|nvme_core|nvme-core' | head
C. Load NVMe modules at boot using modules-load.d (backup layer)
This is a safety net in case initramfs was not rebuilt correctly, or a kernel change occurs.
File: /etc/modules-load.d/nvme.conf
sudo tee /etc/modules-load.d/nvme.conf >/dev/null <<'EOF'
nvme
nvme-core
nvme-tcp
EOFApply without reboot:
sudo systemctl restart systemd-modules-load.service
sudo systemctl status systemd-modules-load.service --no-pager
lsmod | grep -E '(^nvme|nvme_tcp|nvme_core|nvme-core)'D. Create a reliable NVMe/TCP connect service (systemd)
This ensures your NVMe connections come up automatically after the network is ready and after modules are available.
Create a connect script
File: /usr/local/sbin/vast-nvme-tcp-connect.sh
sudo tee /usr/local/sbin/vast-nvme-tcp-connect.sh >/dev/null <<'EOF' #!/usr/bin/env bash set -euo pipefail # Example: connect to one or more subsystems # Replace with your values: # - TRADDR: target IP/DNS # - TRSVCID: target port (usually 4420) # - NQN: subsystem NQN CONNECTS=( # "traddr trsvid nqn" "10.0.0.10 4420 nqn.2014-08.org.nvmexpress:uuid:aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee" # "10.0.0.11 4420 nqn.2014-08.org.nvmexpress:uuid:ffffffff-1111-2222-3333-444444444444" ) for c in "${CONNECTS[@]}"; do read -r TRADDR TRSVCID NQN <<< "${c}" echo "Connecting: traddr=${TRADDR} trsvcid=${TRSVCID} nqn=${NQN}" # idempotent connect: if already connected, nvme-cli will usually return non-zero # so we check existing connections first if nvme list-subsys 2>/dev/null | grep -q "${NQN}"; then echo "Already connected to ${NQN}, skipping" continue fi nvme connect -t tcp -a "${TRADDR}" -s "${TRSVCID}" -n "${NQN}" done # Optional: wait briefly for /dev/nvme* nodes to appear udevadm settle EOF sudo chmod +x /usr/local/sbin/vast-nvme-tcp-connect.sh
Create the systemd unit
File: /etc/systemd/system/vast-nvme-tcp.service
[Unit]
Description=NVMe/TCP connect (persist across reboot)
Wants=network-online.target
After=network-online.target systemd-modules-load.service
[Service]
Type=oneshot
RemainAfterExit=yes
# Guard rails: ensure modules exist even if initramfs/modules-load missed
ExecStartPre=/usr/sbin/modprobe nvme
ExecStartPre=/usr/sbin/modprobe nvme-tcp
ExecStart=/usr/local/sbin/vast-nvme-tcp-connect.sh
# Give enough time for network + connect
TimeoutStartSec=300
[Install]
WantedBy=multi-user.targetEnable and start:
sudo systemctl daemon-reload
sudo systemctl enable --now vast-nvme-tcp.service
sudo systemctl status vast-nvme-tcp.service --no-pagerWhy these unit settings matter
Wants=network-online.target and After=network-online.target
Ensures the service runs only after the system considers networking “online”. This is important because NVMe/TCP requires an active network path to the target.
After=systemd-modules-load.service
Adds ordering so module loading happens before the connect attempt (backup layer).
ExecStartPre=modprobe ...
Hard guarantee that the kernel modules are present immediately before attempting a connect, even if earlier steps were skipped.
Type=oneshot + RemainAfterExit=yes
The connect is a one-time action at boot, but systemd treats the service as “active” once done, which is often convenient for dependency chains.
TimeoutStartSec=300
Prevents premature failure during slower boot/network conditions.
E. Mounting: recommended fstab options and flag explanations
Once the NVMe namespace exists (example device: /dev/nvme0n1p1), you can mount it reliably using fstab. The key is to avoid boot hangs when networking or NVMe targets are temporarily unavailable.
Example fstab entry (using filesystem UUID)
Get the UUID first:
sudo blkid /dev/nvme0n1p1Example /etc/fstab entry:
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /mnt/data xfs _netdev,nofail,x-systemd.automount,x-systemd.device-timeout=30 0 2If using ext4, swap xfs with ext4.
What each flag does
_netdev
Tells the boot process this filesystem depends on networking. This helps systemd and mount logic avoid attempting the mount before the network stack is ready.
Host-side only. No storage-side support required.
nofail
If the mount fails at boot (for example, target not reachable yet), the system continues booting instead of dropping into emergency mode. This is critical for resilience.
x-systemd.automount
Creates an automount unit so the filesystem mounts on first access instead of during early boot. This prevents boot delays and reduces sensitivity to timing issues.
x-systemd.device-timeout=30
Limits how long systemd waits for the block device to appear before considering the mount attempt failed. This avoids long boot stalls.
Optional flag (use only if you specifically want auto-unmount behavior):
x-systemd.idle-timeout=300
If set alongside x-systemd.automount, systemd may unmount the filesystem after 300 seconds of inactivity.
If you want the filesystem to remain mounted once accessed, do not use this.
F. Verification steps (copy/paste)
After reboot, verify modules are loaded
lsmod | grep -E '(^nvme|nvme_tcp|nvme_core|nvme-core)'Verify module load timing and boot messages
sudo journalctl -b | grep -E 'systemd-modules-load|nvme|nvme-tcp' | head -n 200Verify the connect service ordering and critical chain
systemd-analyze critical-chain vast-nvme-tcp.serviceVerify the service logs
sudo journalctl -u vast-nvme-tcp.service -b --no-pagerVerify NVMe connections and namespaces
nvme list nvme list-subsysValidate mounts
If using x-systemd.automount, trigger mount by accessing the path:
ls -la /mnt/data mount | grep '/mnt/data'
Common pitfalls and how this approach avoids them
Mount attempted before the network is ready.
Emphasize network-online.target ordering + _netdev + (optionally) automount.
Module not loaded at connect time
Covered by initramfs + modules-load + modprobe.
Boot delays or hangs
Avoided via nofail, x-systemd.automount, x-systemd.device-timeout.
G. Persisting a udev rule
A udev rule does not establish NVMe/TCP connections or ensure reconnection on boot.
It only applies the configuration after a device/subsystem appears. Persistent connectivity should be handled separately (e.g., via systemd boot-time connect + fstab mount dependencies)
Confirm the NVMe subsystem exists in sysfs
ls -l /sys/class/nvme-subsystem/Confirm iopolicy exists for that subsystem
sudo cat /sys/class/nvme-subsystem/nvme-subsys0/iopolicyExpected: A value like round-robin or something else (e.g., numa, none, etc.).
Show which controller(s) belong to the subsystem
ls -l /sys/class/nvme-subsystem/nvme-subsys0/You should see symlinks like nvme0 under that directory.
Apply your udev rule (exact snippet)
Create the persistent rule
sudo tee /etc/udev/rules.d/71-nvmf-iopolicy.rules >/dev/null <<'EOF' # Persistently set NVMe subsystem I/O policy to round-robin when subsystem is added/changed ACTION=="add|change", SUBSYSTEM=="nvme-subsystem", ATTR{iopolicy}="round-robin" EOFReload Rules
sudo udevadm control --reload-rulesTrigger the rule for nvme-subsystem devices
sudo udevadm trigger --subsystem-match=nvme-subsystemVerify it changed
sudo cat /sys/class/nvme-subsystem/nvme-subsys0/iopolicyChecking that the udev device path is correct and attributes are visible.