Exporting VAST Block CSI Driver Metrics to Prometheus

Prev Next

VAST Block CSI Driver can be configured to expose CSI node and controller metrics in Prometheus format. The node metrics include total counts and average durations for CSI RPCs and NVMe connections. The controller metrics include total counts and average durations for CSI RPCs.

Enabling Export of CSI Metrics

By default, the driver does not expose any metrics.

To enable export of metrics:

  1. Add the following to the driver's Helm chart configuration file:

    node:
      metrics:
        enabled: true
        port: 9092
    controller:
      metrics:
        enabled: true
        port: 9093
  1. Install or upgrade the driver's Helm chart.

Exposed CSI Metrics Endpoints and Ports

When metrics export is enabled:

  • A headless service is created that serves metrics requests at two endpoints:

    • GET /metrics for getting the metrics in Prometheus format (counters, histograms, gauges),

    • GET /health for health checks.

  • The node's DaemonSet pods expose the node metrics port 9092.

  • The controller's Deployment/StatefulSet pods expose the controller metrics port 9093.
    NOTE: You can override default ports by specifying a different value in the port entry under node or controller metrics in the driver's Helm chart configuration file.

Exported CSI Metrics

NOTE: For a complete reference on CSI metrics, see https://github.com/vast-data/vast-csi/blob/v2.6/docs/METRICS_REFERENCE.md.

CSI Node Metrics

  • Mounts/umounts

    csi_node_mount_operations_total

    Total number of mounts (of a PVC to a pod)

    csi_node_mount_duration_seconds

    Duration of mounts (in seconds)

    csi_node_umount_operations_total

    Total number of umounts

    csi_node_umount_duration_seconds

    Duration of umounts (in seconds)

  • NVMe connects per cluster

    csi_node_nvme_connect_operations_total

    Total number of active NVMe connections

    csi_node_nvme_connect_duration_seconds​​

    Duration of NVMe connections (in seconds)

CSI Controller Metrics

csi_plugin_operations_total

Total number of all CSI gRPC method calls (CreateVolume, DeleteVolume, ControllerPublishVolume, and so on)

csi_plugin_operations_seconds

Average duration of a CSI gRPC method call

Metrics Details

In addition to the measured value per metric type, a metric may include labels that provide additional information about the measured operation. For example, the following metric:

csi_node_mount_operations_total{operation_type="block-mount",status="success",node_name="worker-node-1",pvc_namespace="prod"} 15

specifies that the measured value was taken for the mounts that:

  • were made through the block access protocol,

  • completed successfully,

  • occurred on a worker node named worker-node-1,

  • targeted the prod namespace.

Accessing Exported CSI Metrics

NOTE: For more detailed guidance, see https://github.com/vast-data/vast-csi/blob/v2.6/docs/METRICS_GUIDE.md.

Run the following commands to verify that the metrics endpoints work as expected:

  • For node metrics:

    kubectl get pods -n vast-csi -l app.kubernetes.io/component=csi-node
    kubectl port-forward -n vast-csi pod/<CSI node pod name> 9092:9092
    curl -s http://localhost:9092/metrics
    curl -s http://localhost:9092/health
  • For controller metrics:

    kubectl get pods -n vast-csi -l app.kubernetes.io/component=csi-controller
    kubectl port-forward -n vast-csi pod/<CSI controller pod name> 9093:9093
    curl -s http://localhost:9093/metrics
    curl -s http://localhost:9093/health

Sample Metrics Values for Common Scenarios

The following illustrates typical metrics values in common scenarios.

NOTE: For more examples, see https://github.com/vast-data/vast-csi/blob/v2.6/docs/METRICS_EXAMPLES.md.

  • Upon NMVe connect to the VAST Cluster:

    • ​​csi_node_nvme_connect_operations_total​ is set to ​1.0​​.

    • ​​csi_node_nvme_connect_duration_seconds​​ shows the connect duration.

  • After mounting a block device:

    • ​​csi_node_nvme_connect_operations_total​ is set to ​1.0​​.

    • ​​csi_node_mount_duration_seconds​ with ​operation_type="block_mount"​​ shows the mount duration.