Client Metrics

Prev Next

The VAST Cluster can collect metric data for NFS operations from Linux client machines and present the information in a series of analytics graphs. Information is collected using a VAST program that is deployed on the client. Information is collected periodically and sent to a VAST Client Metrics Database created for this purpose on the Cluster. You view the metrics as a series of Analytics graphs in the Client Metrics page of the VAST Web UI.

Configuring the Cluster for Collection of Client Metric Information

Configuring collection of client metric information on the cluster involves these steps:

  • Enabling client metrics collection on the cluster.

  • Creating and configuring a user, who will be associated with the identity policy, in order to access the metrics database.

  • Creating an identity policy that grants access to the client metrics database on the cluster.

Enabling and Configuring Client Metrics collection on the VAST Cluster
  1. In the VAST Web UI, navigate to the Analytics page, and select Client Metrics.

  2. If Client Metrics have not yet been enabled on the cluster, this message is shown: No client metrics available. Click here to enable and define your client metrics data.

    Click on the link.

    Toggle the Enable switch to enable collection of client metrics.

    Alternatively, if metrics collection has been enabled, click on the Settings icon (Settings-icon.png).

    This opens the Client metrics data dialog.

  3. In the General section, select the following

    Field

    Description

    Bucket name

    The name of a new bucket which is created to hold the metric database. The bucket and database are created when metrics are enabled.

    Bucket owner

    Select the user created above, from the list of users, as the owner of the bucket that is created for the metric database.

    Retention period

    Select the retention period for metric data that is stored in the bucket. Data older than this time is deleted from the bucket. The retention period is in hours, minutes, or days.

    Maximum size

    The maximum size of the database, in MB or GB. If the limit is reached, older data is deleted from the bucket.

  4. Click Save. A bucket is created and, in it, a dedicated VAST database is created (named default-vast-client-metrics-bucket) that appears in the VAST Database page of the VAST Web UI.

Creating a Cluster user for the the VAST Client VNFS-Collector to access the VAST Database

Complete these steps to create a new user for the tenant. The user  will own the VAST Client Metric database for the tenant. You can also use an existing user for the tenant, in which skip these steps, and proceed with the next procedure to Configure the User.

  1. Navigate to the Users tab, and click Create User.

  2. Enter a name and UID for the user, and select a Local Provider for the user.

  3. Select the tenant from the list.

  4. Click Create. This creates the user that will own the Client Metrics database for the selected tenant, and the bucket that contains it.

Creating an Identity Policy that grants full access to the Client Metrics Database
  1. In the VAST Web UI, navigate to the Users Management page, and select the Identity Policies tab.

  2. Click Create Policy. This policy will allow access to the Client Metrics Database on the Cluster.

  3. Enter a name and select the tenant for the policy. The tenant is the tenant for which metrics are being collected.

  4. In the Custom or pre-defined rules section, select Action List, and for Effect, select Allow.

  5. In the list of properties, select Select all actions, and move them to the Selected Properties pane.

  6. Click Add to JSON.

  7. Click Create, to create the identity policy.

    This is an example of an identity policy for Client Metrics:

    {
    "Version": "2012-10-17",
    "Statement": [
    	{
    	"Action": [
    		"s3:TabularInsertRows",
    		"s3:TabularListColumns",
    		"s3:TabularListSchemas",
    		"s3:TabularListTables"
    		],
    	"Effect": "Allow",
    	"Resource": "arn:aws:s3:::<bucket>/*"
    	}
    	]
    }

    where <bucket> is the bucket created when metrics were enabled on the cluster.

Configuring  a User to Access the VAST Client Metric Database and Generate Access & Secret Keys

These steps associate the identity policy for accessing the database with the user, and add access and secret keys.

  1. In the Users tab, right-click on the user created in the previous steps.

  2. Select the tenant.

  3. In the Identity Policies section, select the policy created above.

  4. In the S3 Access Keys section, click on Add New Key and then Generate New Key. This generates a new set of access and secret keys for the user.

  5. Copy the access and secret keys that are generated. They are used to configure the client agent (vnfs-collector).

  6. Click Update.

The VAST Client VNFS-Collector

Follow these instructions to download the VAST NFS Collector and install it on the client machine.

Edit the configuration file and add these items to the vdb section:

vdb:
  db_endpoint: <vip> or <vip-fqdn>
  db_access_key: <access-key>
  db_secret_key: <secret-key>  
  db_bucket: <bucket>

where vip is the one of the virtual IPs for the cluster, <access-key> and <secret-key> are the keys generated above (in Configuring  a User to Access the VAST Client Metric Database and Generate Access & Secret Keys), and <bucket> is the bucket name provided above, when Client Metrics are enabled.

The collector runs as an ebpf daemon that collects a set of data from the client. The collector runs on the client in privileged mode with elevated permissions. It can run a systemd service, Docker container, Kubernetes daemonset or a standalone daemon.

The collector collects data for workloads accessing an NFS filesystem on the client, and tags it with  custom identifiers that are derived from process environment variables. A common use case is a job scheduler (SLURM for example) that injects job identification information into the jobs' running processes (e.g. SLURM_JOB_ID).

Viewing Client Metrics as Charts in VAST Web UI

  1. In the VAST Web UI, navigate to the Analytics page, and select Client Metrics.

  2. Select Analytics.

  3. Select the tenant. Metric data is collected for each tenant separately, and stored in a separate table in the database.

  4. In the Defined Time field, select the time period for which metrics are shown.

  5. In the filter bar at the top of the screen, select the ID type. This is the ID variable or label used to filter the metrics data for specific workloads of interest running on the client. The list includes a set of predefined ID types, such as HOSTNAME, and UID. Select the ID type of interest to identify workloads on your client.

  6. Next, select values for the ID type selected in the previous step. For example, if HOSTNAME is selected as the ID type, the list shows all the hostsvalues. Select  values for the workloads on the client for which to display metrics. You can select more than one value. The metrics charts will show details for all the selected values.

  7. The metrics are shown as charts, in two sections.

    The upper section (Metrics per JOB_ID Type) shows histograms of metrics, over time,  per the selected workloads (identified by ID type, for example, JOB ID). The histograms show the aggregate data for all the selected workloads.

    For example, if you select JOB ID as the ID type, and then select several workloads (by their JOB ID), the Bandwidth graph shows a  histograms of the bandwidth for the selected workloads, over time, for the time period selected in Step 4.

    Metric

    Description

    Bandwidth

    Shows a histogram of aggregate bandwidth for the selected workloads, over the selected time period. Each column shows the aggregate bandwidth for a specific time.

    IOPS

    Shows a histogram of aggregate I/O operations for the selected workloads, over the selected time period. Each column shows the aggregate I/O operations for a specific time.

    Metadata operations

    Shows a histogram of the aggregate number of specific read/write/operations for each selected workload, over the selected time period. Each column  shows the aggregate number of  operations for a specific time. Operations include read, write, create, and delete.

    Summary BW by Mount

    Shows a histogram of the aggregated data sent to mounts for each selected workload, over the selected time period. Each column shows the aggregate bandwidth to all mounts for a specific value.

    The lower section (Metrics per Mount Reports) shows histograms of metrics, over time, of aggregated data to all the mounts on the client that are used by the selected values.

    Metric

    Description

    Bandwidth

    Shows a histogram of aggregate bandwidth involving the selected workloads, to all mounts, over the selected time period. Each column shows the aggregate bandwidth for a specific time.

    IOPS

    Shows a histogram of aggregate I/O operations involving the selected  workloads, to all mounts, over the selected time period. Each column shows the aggregate I/O operations for a specific time.

    Metadata operations

    Shows a histogram of aggregate number of specific read/write/operations involving the selected workloads, to all mounts, over the selected time period. Each column shows the aggregate number of  operations for a specific time.

    Summary BW by value

    Shows a histogram of the aggregated data sent by selected values, over the selected time period. Each column shows the aggregate bandwidth to all mounts for a specific value.

  8. See additional details for these charts using these actions:

    Action

    Effect

    Hover over a column in a chart

    Tooltip shows details for the column

    Click on a value (for example a specific job ID) in a column

    A panel on the right shows details for the selected point

    Click on an entry in the legend to the right of the histogram

    The detail for the histogram is filtered to show only data for the selected entry

    Click on a value (or group of values) in the filter bar at the top of the screen

    The detail in the histogram is filtered to show only data for the selected entries

Viewing Client Metric Data as a Table in VAST Web UI

  1. In the VAST Web UI, navigate to the Analytics page, and select Client Metrics.

  2. Select Table. A table of individual metrics measurements is shown, containing up to 1000 entries.

Setting up Custom Metric Identifiers

You can add custom identifiers to the list of IDs for which metric data is collected on the client. This is useful if your client jobs are identified by custom IDs. Custom IDs are read by the VNFS Collector on the client.

Adding Custom Identifiers
  1. Navigate to the Analytics page, and select Client Metrics.

  2. Click on the Settings symbol (settings-symbol.png).

  3. In the User defined columns section, in the Column name field enter the name of the ID, and select the type (default is string). The name should be prefixed with 'ENV_'

  4. Click Add Column. The ID is added to the list and appears as a column in the Client Metrics database.

  5. Repeat for additional columns.

  6. Click Save.

Configuring the VNFS Collector for Custom Identifiers

Add the following line to the configuration file for the vnfs-collector:

envs_from_vdb_schema: true

This setting will cause the vnfs-collector to track environment variables based on the columns in the Client Metrics database that have the prefix 'ENV_', set using the procedure above. For example, add a column to the database ENV_SLURM_JOB_ID, for the vnfs-collector to track the environment variable called SLURM_JOB_ID (which is used by Slurm to identify jobs).