VAST NFS Client Install / Upgrade Using KMM on OpenShift

Prev Next

Prerequisites (All Scenarios)

Before installing or upgrading the VAST NFS KMM module, ensure the cluster is prepared.

  1. Install Required Operators: Ensure both Kernel Module Management (KMM) Operator and Node Feature Discovery (NFD) Operator are installed from the OpenShift OperatorHub.
    OpenShift KMM Operator docs
    OpenShift NFD operator docs

  2. Prepare Build Environment: Clone the openshift-vastnfs-kmm-operator repository onto a machine with build tools (git and make). This machine can be any architecture (e.g., OSX, Linux).

git clone https://github.com/vast-data/openshift-vastnfs-kmm-operator
  1. You have an internal cluster image-registry enabled and working, attached to persistent storage.
    It is also possible to point to and use an external Container Registry service. The default one configured in the Makefile is pointing to the OpenShift internal image-registry: image-registry.openshift-image-registry.svc:5000. Just make sure you know what you are doing. See the Makefile lines 11-13.

There is no risk or issue if the image-registry PVC is already provisioned from the VAST cluster. It will be addressed later in the procedure, and the engineer can decide how to handle it.

Option 1: Greenfield Installation

Use this procedure for a fresh deployment of the VAST NFS driver on a new cluster.

1. Deploy the Module

Export your desired version of the VAST NFS client, or just use the default to run the installation.
Example version-specific build:

export  VASTNFS_VERSION=4.0.35

Have your cluster KUBECONFIG configured.

git clone https://github.com/vast-data/openshift-vastnfs-kmm-operator.git 
cd openshift-vastnfs-kmm-operator

export KUBECONFIG=/path/to/cluster.config
make install
make verify

Example Expected result (in case you already have NFS PVCs):

[STEP] Verifying VAST NFS is Active on Nodes
[INFO] Checking node: node1
[WARNING] VAST NFS NOT ACTIVE - Using default kernel NFS
[INFO] Checking node: node2
[WARNING] VAST NFS NOT ACTIVE - Using default kernel NFS
[INFO] Checking node: node3
[WARNING] VAST NFS NOT ACTIVE - Using default kernel NFS

On a cluster with any existing NFS mounts/PVC’s, the above output is expected to be handled easily in the following section.

2. Handle Blocked Nodes

If the KMM worker fails to load on specific nodes (due to existing default NFS usage or loaded kernel module dependencies), you must manually clear the kernel dependencies.

Make sure you drain the node before you run the commands.

Please NOTE you understand your application deployment pattern, making sure that you are aware of the implications of node draining.

oc adm drain <node-name> --force --ignore-daemonsets --delete-emptydir-data
oc debug node/<node-name>

Run the following commands:

chroot /host
rmmod sunrpc

If it fails with Module sunrpc is in use by: auth_rpcgss lockd nfsv3..., remove the dependents first, then remove sunrpc:

rmmod sunrpc
rmmod: ERROR: Module sunrpc is in use by: nfsv4 auth_rpcgss lockd nfsv3 rpcsec_gss_krb5 nfs_acl nfs
rmmod sunrpc nfsv4 auth_rpcgss lockd nfsv3 rpcsec_gss_krb5 nfs_acl nfs
rmmod sunrpc nfsv4 auth_rpcgss lockd nfsv3 rpcsec_gss_krb5 nfs_acl nfs

Repear until you see:
rmmod: ERROR: Module sunrpc is not currently loaded

Undrain the node:

oc adm uncordon <node-name>

Repeat the commands on all remaining nodes on the cluster.

3. Verify Module Installation

Run the verification script to ensure all nodes report the correct version.

make verify

Expected sample output:

[STEP] Verifying VAST NFS is Active on Nodes
[INFO] Checking node: node1
[SUCCESS] VAST NFS ACTIVE - version: 4.0.35
[INFO] Checking node: node2
[SUCCESS] VAST NFS ACTIVE - version: 4.0.35
[INFO] Checking node: node3
[SUCCESS] VAST NFS ACTIVE - version: 4.0.35
[INFO] Summary: 3/3 nodes have VAST NFS active
[SUCCESS] VAST NFS is active on all nodes

4. Configure VAST CSI Mount Options

Update the existing NFS CSI configuration to utilize the advanced VAST mount options.

oc edit VastStorage <your storageClass name> -n vast-csi

Add the following under your mount options configuration (For example):

mountOptions:
  - nconnect=8
  - noatime
  - nodiratime
  - rsize=1048576
  - wsize=1048576
  - tcp
  - vers=3
  - remoteports=dns

Save the configuration and wait 1 minute for the StorageClass changes to propagate.

VAST procedure for NFS client multipathing, performance & mount options; see Docs.

For more advanced VAST NFS Client features and options, see the Docs.

VAST Cluster deployment with NO DNS configuration, or no DNS configuration for the VIP pools, can use remoteports mount option, using VIP pool ip address, for example:
remoteports=10.141.200.151-10.141.200.154"

4.1 Migrate existing PVs to an advanced NFS client

Please note that this step might affect your application's availability (it really depends on the application design), so plan accordingly!

Updating the StorageClass or VastStorage CR only affects newly created PVCs. Existing Persistent Volumes (PVs) retain the mount options they were created with. Moving a pod or scaling to zero does not refresh these options.

To apply the new VAST mount options to an existing PV, you must patch the object directly:

oc patch pv <existing-pv-name> -p '{"spec":{"mountOptions":["nconnect=8","noatime","nodiratime","rsize=1048576","wsize=1048576","tcp","vers=3","remoteports=dns"]}}'

Once patched, initiate/schedule a rolling restart of the application pods attached to this PV so the Kubelet can execute the mount command with the new parameters.

4.2 Optional: Verify mount options are applied

To confirm that pods are actually using VAST advanced mount options (and not just configured to use them), inspect the effective mount options either on the node or from within the pod.

oc -n vast-csi exec vast-upgrade-test-0 -- mount | grep nfs | grep -q nconnect && echo 'Hooray!!!' || echo 'VAST NFS Client mount options is missing'

Option 2: Upgrading an Existing Cluster (Zero Downtime)

This procedure demonstrates a rolling upgrade of the kernel module without disrupting active stateful workloads.

1. Optional (For Verification): Example Workload

To validate the zero-downtime upgrade, deploy a 3-replica StatefulSet utilizing podAntiAffinity to spread across nodes.

Example_workload.yaml

cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: vast-upgrade-test
  namespace: vast-csi
spec:
  serviceName: "vast-test"
  replicas: 3
  selector:
    matchLabels:
      app: vast-writer
  template:
    metadata:
      labels:
        app: vast-writer
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - vast-writer
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: writer
        image: busybox
        command:
        - "/bin/sh"
        - "-c"
        - "while true; do touch /mnt/vast/\$(hostname)-\$(date +%Y-%m-%d-%H-%M-%S); sleep 5; done"
        volumeMounts:
        - name: data
          mountPath: /mnt/vast
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: vastdata-filesystem
      resources:
        requests:
          storage: 1Gi
EOF

2. Initiate the Upgrade

Capture your current cluster VAST NFS Client status and version.

Run the following commands from the path of the openshift-vastnfs-kmm-operator repo.

cd openshift-vastnfs-kmm-operator
make verify

Export the new target version and apply the update, for example:

export VASTNFS_VERSION=4.0.36
make install

3. The Rolling Node Upgrade Process

Perform these steps one node at a time to maintain application availability.

Step A: Drain the Node

oc adm drain <node-name> --ignore-daemonsets --delete-emptydir-data --force

Step B: Force Module Cleanup

Clear the old NFS state and force the kernel to drop the legacy modules.

oc debug node/<node name> -- chroot /host /bin/sh -c "
    systemctl stop rpcbind.socket rpcbind.service gssproxy.service;
    umount -a -t nfs,nfs4 -f -l;
    sleep 2;
    rmmod sunrpc lockd nfsv3 nfs_acl nfs 2>/dev/null;
    rmmod sunrpc lockd nfsv3 nfs_acl nfs 2>/dev/null;
    rmmod sunrpc lockd nfsv3 nfs_acl nfs 2>/dev/null;
    echo 'Node cleaned'"

(Note: If the KMM worker pod on this node is stuck in CrashLoopBackOff, delete it now so it restarts and applies the new module instantly.)

NFSStep C: Verify Node Upgrade

We uncordon the node to allow the KMM operator to schedule the automated kernel module replacement by scheduling a POD on that node that does the deployment.

oc adm uncordon <node-name>
sleep 180 && make verify

Ensure the specific node you are working on now reports the new version (e.g., 4.0.36).

Step D: Restore RPC Services

You must start the RPC services before allowing pods back on the node to prevent lock/statd errors.

oc debug node/<node-name> -- chroot /host /bin/sh -c "systemctl start rpcbind rpc-statd gssproxy"

In case you used the example StatefulSet, then your StatefulSet pods will now be able to schedule on this node and resume utilizing the upgraded VAST NFS module.

If not, make sure your other existing workload can be scheduled properly on the newly updated node/s, with no issues.

Repeat Steps A-D for all remaining nodes.