Exporting Metrics to Prometheus

Prev Next

Overview

Prometheus is an open-source systems monitoring and alerting toolkit that provides a data model for describing and recording metrics over time and also provides a web application to display those metrics. Prometheus can be configured to fetch metrics from a third party system by means of a software entity called a Prometheus exporter. A Prometheus exporter collects metrics from a third party system, converts them to Prometheus metrics and exposes them via a resource path.

VAST Cluster provides a Prometheus exporter resource in the VMS REST API. The VAST Prometheus exporter can fetch pre-defined metrics from the VMS database and return the data as a plain/text key-value format response. You can configure the Prometheus server to scrape the metrics from the exporter at a chosen interval and display the VMS metrics through its display applications.

Screen_Shot_2022-08-14_at_15_50_21.png

Example Prometheus Graph of VAST Cluster Metric

How to Configure Prometheus to Collect VMS Metrics

For information about how to configure the Prometheus server to collect metrics from the VMS Prometheus exporter, read about Prometheus configuration.

The following are guidelines for providing some of the key parameters in the scrape_configs section of the Prometheus server configuration file:

  • metrics_path. This is the HTTP resource path from which to fetch metrics from VMS. Set it to one of the following:

    • /api/prometheusmetrics/alarms

      Exports all active VAST Cluster alarms.

    • /api/prometheusmetrics/users

      Exports user bandwidth, IOPS and metadata IOPS metrics on read and/or write operations.

    • /api/prometheusmetrics/views

      Exports performance metrics per view, including bandwidth, IOPS, metadata IOPS, latency and QoS, and also view logical and physical capacity.

    • /api/prometheusmetrics/quotas

      Provides information related to quotas configured on the cluster, such as the quota limits set and number of users who have exceeded the quota or who have been blocked due to quota exceeded condition.

    • /api/prometheusmetrics/devices

      Provides information about the SSD or NVRAM physical state, such as presence of media errors or current temperature, and overall operational status (active or failed).

    • /api/prometheusmetrics/defrag

      Exports metrics related to defragmentation.

    • /api/prometheusmetrics/switches

      Exports network monitoring metrics that are collected from the cluster's switches.

    • /api/prometheusmetrics/user_connections

      Exports the 100 users with the highest number of active S3 connections, including only users with an attached QoS with a limit > 0.

    • /api/prometheusmetrics/

      Exports cluster and CNode metrics that are not exported by the above-listed endpoints. This includes, for example, performance metrics per storage protocol, detailed information about the state of the hardware, and others.

    • /api/prometheusmetrics/all

      Exports all VAST Cluster metrics. This includes each and every metrics that can be exported by the above-listed exporter endpoints. Due to big amount of data being exported, using this endpoint to collect metrics from a very large cluster is not recommended.

    Notice

    With VAST Cluster of a version earlier than 4.6.0-SP11, only one exporter endpoint is supported, /api/prometheusmetrics/, which exports all metrics.

  • Under the static configs section, where targets is set to the target IP <EXPORTER_HOST> in the snippet below, specify the cluster's VMS virtual IP in place of <EXPORTER_HOST>. This is the IP that you use to browse to the VAST Web UI. Set the port to 443 as shown in the snippet.

  • To authenticate to the VMS REST API using basic authentication, provide a VMS manager user name and password in the basic_auth section.

    Note

    A VMS manager user granted the minimum read-only role has sufficient permissions for calling the exporter endpoint. The read-only role is a built in default role that you can assign to a manager user. For information about creating and modifying managers and roles, see Authorizing VMS Access and Permissions.Authorizing VMS Access and Permissions

    Note

    When viewing a saved configuration on the Prometheus server, the password is hidden and displayed as a secret. For example:

      ...
    basic_auth:
        username: prometheus
        password: <secret>
    ...

    Note

    VMS REST API supports basic authentication and authentication over HTTPS secured by JSON Web Tokens (JWTs).

    For information about generating and using JWTs, see Authenticating to the VMS REST API in the VMS REST API documentation, which is available at https://<VMS_VIP>/docs/index.html from within your VMS management network (where <VMS_VIP> is your VMS virtual IP address for accessing the VAST Web UI).

  • Set the TLS configuration to verify or to skip client side validation of the VMS SSL certificate as needed to ensure that an HTTPS connection with VMS will succeed. This will depend on your VMS TlS configuration, such as whether you have a CA-signed certificate installed in VMS. See Prometheus configuration instructions for configuration options for specifying the TLS client configuration in the tls_config section. In the example shown in the snippet below, client side validation of the certificate is skipped.Installing an SSL Certificate

  • Setting a scrape timeout of 30 seconds should ensure a response for a larger scale system or load. The scrape interval must be larger than the timeout, so we recommend a scrape interval of one minute.

The following is a snippet of a sample Prometheus server configuration file:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  - job_name: 'vast'
    scheme: https
    scrape_interval: 1m
    scrape_timeout: 30s
    metrics_path: '/api/prometheusmetrics/users'
    static_configs:
        - targets: ['<EXPORTER_HOST>:443']
    basic_auth:
       username: '<USER_NAME>'
       password: '<PASSWORD>'
    tls_config:
        insecure_skip_verify: true

.

Pre-Built Grafana Dashboards

Grafana dashboards are available for importing into your Grafana instance. These dashboards provide statistics and visualisations based on scraped metrics, as follows:

  • Main dashboard. Cluster health and statistics.

  • Space capacity. Space and quotas statistics.

  • CNodes. Performance and hardware statistics per CNode.

  • DNodes - Performance and hardware statistics per DNode, SCM and SSD.

  • Protocols metadata statistics. NFSv3, NFSv4 and S3 metadata latency statistics.

  • Views. Top views, and per-view performance statistics.

  • Users. Top users, and per-user performance statistics.

  • Vips and vippools. Per VIP and VIP Pool statistics.

  • Alarms . Active alarms per component.

online.png

top5.png

To install and configure the Grafana dashboards, download the attached .json files and import them to your Grafana instance.