The VAST Data platform offers a comprehensive set of performance and telemetry metrics that provide deep visibility into system behavior, workload performance, and Quality of Service (QoS) enforcement. These metrics are essential for monitoring infrastructure health, troubleshooting performance anomalies, validating SLAs, and enabling observability in multi-tenant environments. For example, they help detect bandwidth bottlenecks, track latency spikes, and identify “noisy neighbor” workloads. VAST metrics also support usage-based analytics and capacity forecasting, which are critical for optimizing resource allocation.
This enables Cloud Service Providers (CSPs) to use VAST metrics as the foundation for delivering transparent, metered services to tenants. CSPs can expose selected metrics—such as per-tenant IOPS, bandwidth, or latency—via customer-facing dashboards or API integrations. This enables tenant-level performance reporting and SLA validation while maintaining strict isolation and control. Metrics are collected across all system layers (CNodes, DNodes, switches) and are available in real time or over historical ranges via:
Prometheus Exporter: Exposes metrics in Prometheus/OpenMetrics format
REST API: Full access to raw and derived metrics

API to VMS diagram
Note: Metrics are stored in the VAST Management System (VMS) database and aggregated at 5-minute intervals (or at 1-minute granularity in VAST 5.3+), supporting both internal monitoring and external service reporting.
Metrics Visualization
VAST provides two main tools for visualizing system metrics: the Web UI Dashboard and Grafana Dashboards. These tools enable administrators and cloud providers to monitor performance, detect anomalies, and manage tenant-level observability.
Web UI Dashboard
The VAST Web UI provides a real-time dashboard that displays key cluster metrics, including capacity usage, IOPS, bandwidth, and top-consuming users and views. It provides a high-level overview and enables dynamic sorting to quickly identify performance hotspots or imbalances.
Tenant Managers also have access to this dashboard, but visibility is limited to their own data. It shows per-tenant capacity, IOPS, bandwidth, and usage trends, supporting self-service monitoring in multi-tenant environments.

VMS Dashboard
Grafana Dashboards
VAST provides a comprehensive suite of pre-built Grafana dashboards designed for deep observability and performance analysis. Key highlights include:
Version Compatibility: Works with VAST versions 5.1-sp40 and later using the built-in Prometheus exporter.
Easy Import: Dashboards are provided as
.jsonfiles that can be directly imported into your Grafana instance.Organized Views: Dashboards are organized by tenant, view, and node for targeted troubleshooting.
Use Cases: Ideal for real-time monitoring, historical analysis, QoS enforcement validation, and capacity planning.
These dashboards are production-ready and recommended as-is or as a reference for building custom visualizations. They help ensure consistent metric usage across VAST versions, reduce the chance of misinterpreting metric semantics, and simplify integration with external systems.
To use them, import the .json file, configure your Prometheus data source, and start visualizing metrics. Customized dashboards tailored to specific CSP use cases are also available upon request.
For more details, visit the VAST Grafana Dashboards repository.

VAST Grafana Dashboard
Recommended Expressions on VAST
Purpose | PromQL Expression |
|---|---|
Read IOPS | rate(vast_view_metrics_ViewMetrics_read_iops_count[5m]) |
Read Bandwidth | rate(vast_view_metrics_ViewMetrics_read_bw_sum[5m]) |
Read Latency | rate(vast_view_metrics_ViewMetrics_read_latency_sum[5m]) / rate(vast_view_metrics_ViewMetrics_read_latency_count[5m]) |
Write IOPS | rate(vast_view_metrics_ViewMetrics_write_iops_count[5m]) |
Write Bandwidth | rate(vast_view_metrics_ViewMetrics_write_bw_sum[5m]) |
Write Latency | rate(vast_view_metrics_ViewMetrics_write_latency_sum[5m]) / rate(vast_view_metrics_ViewMetrics_write_latency_count[5m]) |
QoS Throttling | rate(vast_view_metrics_ViewMetrics_qos_wait_for_budget_time_sum[5m]) / rate(vast_view_metrics_ViewMetrics_qos_wait_for_budget_time_count[5m]) |
Derived Metrics (from Version 5.3 and higher)
If PromQL is too complex or unsupported, VAST offers derived metrics. These metrics are based on periodic averages and are less accurate over longer time windows due to the averaging characteristics:
Purpose | Metric Name |
|---|---|
Read IOPS | vast_view_metrics_ViewMetrics_read_iops_time_avg |
Read Bandwidth | vast_view_metrics_ViewMetrics_read_bw_sum_time_avg |
Read Latency | vast_view_metrics_ViewMetrics_read_latency_avg |
Write IOPS | vast_view_metrics_ViewMetrics_write_iops_time_avg |
Write Bandwidth | vast_view_metrics_ViewMetrics_write_bw_time_avg |
Write Latency | vast_view_metrics_ViewMetrics_write_latency_avg |
QoS Throttling | vast_view_metrics_ViewMetrics_qos_wait_for_budget_time_avg |
Command line:
vastpy-cli --json get monitors/ad_hoc_query object_type=view time_frame=5m object_ids=3 prop_list=ViewMetrics,read_bw__time_avg prop_list=ViewMetrics,read_iops__time_avg prop_list=ViewMetrics,read_latency__avgOutput format:
"prop_list": [
"timestamp",
"object_id",
"ViewMetrics,read_bw__time_avg",
"ViewMetrics,read_iops__time_avg",
"ViewMetrics,read_latency__avg"
],QoS Metrics Overview
Metrics / Concept | Description |
|---|---|
vast_view_metrics_ViewMetrics_qos_wait_for_budget_time_avg | Windowed mean time requests in this view spent waiting on QoS budget during the scrape window. Indicates presence/degree of QoS gating. Mostly >0 since it measures the time a code section takes, which is part of IO processing. |
vast_view_metrics_ViewMetrics_qos_wait_for_budget_time_sum | Cumulative seconds of QoS wait accrued by the view (monotonic; use |
vast_view_metrics_ViewMetrics_qos_wait_for_budget_time_count | Cumulative count of affected events included in the |
vast_view_metrics_ViewMetrics_read_bw_avg vast_view_metrics_ViewMetrics_write_bw_avg | Windowed average delivered bandwidth for the view (bytes/s). It is useful to see if throughput is at or near the configured QoS cap. * |
vast_view_metrics_ViewMetrics_read_iops_time_avg vast_view_metrics_ViewMetrics_write_iops_time_avg | Windowed average IOPS for the view (ops/s). Helps separate small-IO vs. streaming patterns. * |
vast_user_read_bw vast_user_write_bw | Per-user windowed average bandwidth (bytes/s). Complements view-level utilization. |
vast_user_read_iops vast_user_write_iops | Per-user windowed average IOPS (ops/s). |
Notes:
Window length for “*_avg” metrics in the averaging window equals your Prometheus
scrape_interval(e.g., 15s), unless configured differently (prometheus.yml)Scopes & Endpoints:
/api/prometheusmetrics/views→ per-view (QoS, performance, etc.)/api/prometheusmetrics/users→ per-user (bandwidth, IOPS, etc.)
HELP/TYPE lines: Each series carries
# HELP <metric> <description>and# TYPE <metric> <type>. Treat these as the authoritative contract for your cluster/build.
Capacity Usage Monitoring
For billing and accurate tenant metering, VAST recommends using quota-based capacity tracking. Quotas provide the most accurate accounting model because usage is tracked directly at the quota path level and can be aggregated per tenant.
Quota-based accounting exposes both:
Logical capacity (
used_capacity)Effective capacity (
used_effective_capacity)
Quota-Based Capacity Tracking
Capacity usage can be retrieved using the /quotas REST API endpoint:
curl -sku admin:****** \
"https://vast-file-server-vms/api/latest/quotas/?tenant_name=tenant-a" | \
jq '.[] | {
name,
tenant_name,
path,
used_capacity,
used_effective_capacity,
used_capacity_tb,
used_effective_capacity_tb
}'Example response:
{
"name": "datasets-quota",
"tenant_name": "tenant-a",
"path": "/",
"used_capacity": 1511299156932695,
"used_effective_capacity": 1511299156932695,
"used_capacity_tb": 1374.519,
"used_effective_capacity_tb": 1374.519
}Note: Multiple quota paths can be aggregated to calculate total tenant capacity usage.
Tenant-Level Capacity Monitoring (Prometheus)
The /prometheusmetrics/tenants endpoint exposes tenant-level logical capacity metrics that CSPs can use for tenant monitoring, dashboards, and billing workflows.
Example query:
curl -sku admin:******** \
"https://vast-file-server-vms/api/latest/prometheusmetrics/tenants"Example metrics:
vast_tenant_metrics_TenantMetrics_logical_capacity_avg
vast_tenant_metrics_TenantMetrics_logical_capacity_sum
vast_tenant_metrics_TenantMetrics_logical_capacity_countNote: Tenant capacity limits must be enabled for tenant-level capacity metrics to be exported through the /prometheusmetrics/tenants endpoint.
Quota, User, and Group Capacity Monitoring (Prometheus)
The /prometheusmetrics/quotas endpoint exposes quota, per-user (UID), and per-group (GID) capacity metrics.
Example query:
curl -sku admin:******* \
"https://vast-file-server-vms/api/latest/prometheusmetrics/quotas"Example metrics:
vast_quota_used_capacity Quota Used Capacity
vast_user_quota_used_capacity User Quota Used Capacity
vast_user_quota_percent_capacity User Quota Capacity Percent Used
vast_group_quota_used_capacity Group Quota Used Capacity
vast_group_quota_percent_capacity Group Quota Capacity Percent UsedNote:
Quotas must be enabled for quota metrics to be exported, even if no hard or soft quota limits are configured.
User/group quota tracking can be enabled without enforcing actual capacity limits.
Alternative Capacity Monitoring Approaches
The following methods can also be used for tenant capacity monitoring, but they are less accurate than quota-based accounting and should be used primarily when quota tracking is unavailable.
Capacity via View API
Returns logical capacity per View:
curl -sku admin:****** \
"https://vast-file-server-vms/api/latest/views/?tenant_name=acme" | \
jq '.[] | {name, path, logical_capacity}'Note: If the tenant has multiple Views or buckets, the values must be aggregated externally for tenant-level accounting.
The /prometheusmetrics/views endpoint also exposes View-level capacity metrics.
Example query:
curl -sku admin:****** \
"https://vast-file-server-vms/api/latest/prometheusmetrics/views"Example metrics
# HELP vast_view_logical_capacity View Logical Capacity
# HELP vast_view_physical_capacity View Physical CapacityCapacity Estimation API
Estimates capacity usage for a specific filesystem path:
curl -sku admin:****** \
"https://vast-file-server-vms/api/latest/capacity/capacity_estimation?tenant_name=acme&path=/"Note: capacity_estimation is path-based and requires an explicit filesystem path. It cannot estimate usage for an entire tenant without path aggregation.
Grafana Dashboard Reference
Use or customize VAST’s official Grafana dashboards to visualize UID usage:
Repository: vast-data/vast-grafana-dashboards
Recommended Dashboard:
Top Actors – Users
Client-Side Observability (NFS only)
VAST's vNFS Collector is an open-source tool that provides deep visibility into NFS workloads by capturing detailed I/O metrics for every NFS mount. It tracks per-operation counters for all key NFSv3 and NFSv4 commands, including READ, WRITE, LOOKUP, and DELETE, along with contextual metadata such as mount points, process names, user IDs, and environment variables like SLURM JOB ID. This rich dataset enables accurate workload profiling and performance tuning.
The collector supports flexible data forwarding, with local JSON logging and seamless integration into Prometheus (for Grafana dashboards), Kafka (for event-driven pipelines), and the VAST DataBase (for historical analytics via Trino, Spark, and Grafana).
VAST CSI Driver Prometheus Metrics
In Kubernetes environments, the VAST CSI Driver also supports exporting CSI node and controller metrics in Prometheus format, enabling observability for storage provisioning, mount operations, CSI RPC performance, and NFS transport health. These metrics can be integrated with Prometheus and Grafana to support operational monitoring and troubleshooting of containerized workloads running on VAST. For more details, see the CSI metrics guide: Exporting VAST CSI Driver Metrics to Prometheus
For more information, visit: