VAST Block CSI Driver can be configured to expose CSI node and controller metrics in Prometheus format. The node metrics include total counts and average durations for CSI RPCs and NVMe connections. The controller metrics include total counts and average durations for CSI RPCs.
Enabling Export of CSI Metrics
By default, the driver does not expose any metrics.
To enable export of metrics:
Add the following to the driver's Helm chart configuration file:
node: metrics: enabled: true port: 9092 controller: metrics: enabled: true port: 9093
Exposed CSI Metrics Endpoints and Ports
When metrics export is enabled:
A headless service is created that serves metrics requests at two endpoints:
GET /metricsfor getting the metrics in Prometheus format (counters, histograms, gauges),GET /healthfor health checks.
The node's
DaemonSetpods expose the node metrics port 9092.The controller's
Deployment/StatefulSetpods expose the controller metrics port 9093.
NOTE: You can override default ports by specifying a different value in the port entry under node or controller metrics in the driver's Helm chart configuration file.
Exported CSI Metrics
NOTE: For a complete reference on CSI metrics, see https://github.com/vast-data/vast-csi/blob/v2.6/docs/METRICS_REFERENCE.md.
CSI Node Metrics
Mounts/umounts
csi_node_mount_operations_totalTotal number of mounts (of a PVC to a pod)
csi_node_mount_duration_secondsDuration of mounts (in seconds)
csi_node_umount_operations_totalTotal number of umounts
csi_node_umount_duration_secondsDuration of umounts (in seconds)
NVMe connects per cluster
csi_node_nvme_connect_operations_totalTotal number of active NVMe connections
csi_node_nvme_connect_duration_secondsDuration of NVMe connections (in seconds)
CSI Controller Metrics
| Total number of all CSI gRPC method calls (CreateVolume, DeleteVolume, ControllerPublishVolume, and so on) |
| Average duration of a CSI gRPC method call |
Metrics Details
In addition to the measured value per metric type, a metric may include labels that provide additional information about the measured operation. For example, the following metric:
csi_node_mount_operations_total{operation_type="block-mount",status="success",node_name="worker-node-1",pvc_namespace="prod"} 15specifies that the measured value was taken for the mounts that:
were made through the block access protocol,
completed successfully,
occurred on a worker node named
worker-node-1,targeted the
prodnamespace.
Accessing Exported CSI Metrics
NOTE: For more detailed guidance, see https://github.com/vast-data/vast-csi/blob/v2.6/docs/METRICS_GUIDE.md.
Run the following commands to verify that the metrics endpoints work as expected:
For node metrics:
kubectl get pods -n vast-csi -l app.kubernetes.io/component=csi-node kubectl port-forward -n vast-csi pod/<CSI node pod name> 9092:9092 curl -s http://localhost:9092/metrics curl -s http://localhost:9092/health
For controller metrics:
kubectl get pods -n vast-csi -l app.kubernetes.io/component=csi-controller kubectl port-forward -n vast-csi pod/<CSI controller pod name> 9093:9093 curl -s http://localhost:9093/metrics curl -s http://localhost:9093/health
Sample Metrics Values for Common Scenarios
The following illustrates typical metrics values in common scenarios.
NOTE: For more examples, see https://github.com/vast-data/vast-csi/blob/v2.6/docs/METRICS_EXAMPLES.md.
Upon NMVe connect to the VAST Cluster:
csi_node_nvme_connect_operations_totalis set to 1.0.
csi_node_nvme_connect_duration_seconds shows the connect duration.
After mounting a block device:
csi_node_nvme_connect_operations_totalis set to 1.0.
csi_node_mount_duration_secondswithoperation_type="block_mount" shows the mount duration.