Overview
VAST DataEngine enablement requires prerequisite configurations that are external to the VAST cluster as well as prerequisite configurations on the VAST cluster itself.
Additionally, there are configurations on the VAST cluster that are required for DataEngine usage and are not prerequisites to enablement.
High Level Steps
To enable VAST DataEngine on a VAST Cluster tenant, follow these steps in this order:
Configure Prerequisites on the VAST Cluster. These are configurations that you must be established on the VAST Cluster tenant before you enable DataEngine.
Enable DataEngine. In this step, you will be prompted to connect the prerequisite external services to the tenant.
Establish Prerequisite External Services
VAST DataEngine connects with the following services that are external to the VAST Cluster. They must be configured before you start:
One or more container registries. Used to store images of the functions deployed by DataEngine. The container registry must be accessible over https and have a valid CA certificate.
One or more Kubernetes clusters. Used to execute DataEngine pipelines and functions.
Optionally, a third party event broker. DataEngine requires an event broker. This can be either a third party event broker or the VAST Event Broker. If you choose to deploy DataEngine with the VAST Event Broker, you do not need to configure it before the enablement procedure.
Preparing Container Registries
A container registry is a service for storing and distributing container images. You can use a registry service that supports authentication with user name and password, such as a Docker Hub hosted registry service, or a registry that supports a stable authentication secret, such as a AWS ECR hosted registry. You will need to connect the tenant's DataEngine to at least one container registry. You will store images of the functions you want to deploy in the container registry.
Preparing Kubernetes Clusters
Install the VAST Zarf DataEngine package on each Kubernetes cluster in your environment that you intend to use to deploy VAST DataEngine functions and pipelines. This package installs the following services on the Kubernetes cluster:
VAST Operator. VAST's Kubernetes operator, which enables deployment of VAST pipelines and functions on the Kubernetes cluster.
VAST Telemetry Collector. Provides observability logs and traces over pipelines and functions executed on the Kubernetes cluster by DataEngine.
Knative components used by DataEngine.
For environments where Zarf is not already in use for managing Kubernetes deployments, you will also install the Zarf agent and the VAST Zarf mutator from the package.
To install the VAST Zarf DataEngine package on each Kubernetes cluster, do one of the following:
Installing the VAST Zarf DataEngine package (Zarf Not Already in Use)
Use this procedure if Zarf is not already actively being used to manage your Kubernetes cluster deployment.
Note
To check if Zarf is actively being used to manage your Kubernetes cluster deployment:
$ kubectl get pods -n zarf NAME READY STATUS RESTARTS AGE agent-hook-6c97484f98-5q9jj 1/1 Running 0 10m agent-hook-6c97484f98-tgcjf 1/1 Running 0 10m zarf-docker-registry-fd8859657-4rlm9 1/1 Running 0 11mIf Zarf is in fact already deployed, switch to Installing the VAST Zarf DataEngine package (Zarf Already in Use) .
Ensure you have a valid Kubeconfig file for connecting to the Kubernetes cluster.
Obtain the VAST Zarf DataEngine package,
vast_dataengine_release_<release number>_<pipeline id>.tar.gz.Extract the Zarf agent init package from the VAST Zarf DataEngine package (for example
zarf-init-amd64-v0.60.0.tar.zst) and runzarf initwith the following options :zarf init --architecture amd64 zarf-init-amd64-v0.60.0.tar.zst --set REGISTRY_HPA_AUTO_SIZE=true --confirm --log-level debugNote
The above call assumes that a default storage class exists. Otherwise, add the
--storage-classoption to the call. For example--storage-class=local-path.Extract the VAST Operator file (for example,
zarf-package-dataengine-amd64-1.0.0.tar.zst) and runzarf package deploywith the following options:zarf package deploy --architecture amd64 zarf-package-dataengine*.tar.zst --confirm --log-level debugAdd the following labels to the Kubernetes namespace:
kubectl label namespace knative-serving zarf.dev/vast=mutate kubectl label namespace knative-eventing zarf.dev/vast=mutate kubectl label namespace vast-dataengine zarf.dev/vast=mutateExtract the VAST Zarf mutator enhancement from the VAST Zarf DataEngine package (for example
vast-zarf-mutator-amd64-v0.60.0.tar.zst) and runzarf package deploywith the following options:zarf package deploy --architecture amd64 vast-zarf-mutator-amd64*.tar.zst --confirm
Installing the VAST Zarf DataEngine package (Zarf Already in Use)
Note
This procedure can also be used to update the VAST Zarf DataEngine package.
Use this procedure if Zarf is already in use for managing your Kubernetes cluster deployment.
Ensure you have a valid Kubeconfig file for connecting to the Kubernetes cluster.
Obtain the VAST Zarf DataEngine package,
vast_dataengine_release_<release number>_<pipeline id>.tar.gz.Extract the VAST Operator file (for example,
zarf-package-dataengine-amd64-1.0.0.tar.zst) and runzarf package deploywith the following options:zarf package deploy --architecture amd64 zarf-package-dataengine-amd64*.tar.zst --confirm --log-level debug
Configure Prerequisites on the VAST Cluster
This procedure uses the VAST Web UI to create all the prerequisite configurations necessary on a given VAST Cluster tenant for enabling DataEngine.
If you want to enable DataEngine on a new tenant rather than the default tenant or another existing tenant, create a new tenant on the (VAST) cluster.
If you want to use a third party event broker as the default event broker for DataEngine, configure the third party event broker.
If you want to use VAST Event Broker as the default event broker for DataEngine, Provision a Bucket Owner User
Create a New Tenant (Optional)
If you want to enable DataEngine on a new tenant rather than the default tenant or another existing tenant, create a new tenant on the (VAST) cluster.
To create a new tenant with minimal settings:
From the left navigation menu, select Element Store and then Tenants.
Click Create Tenant to open the Create New Tenant dialog.
In the General tab, complete the fields:
Tenant name
Enter a name for the tenant.
Domain
The domain name for the tenant.
The domain name is used to build the cluster's VMS login URL for the tenant, as shown in the preview below:
https://<VMS IP>/#/login/<domain name>.If a domain name is not specified, the tenant name is used:
https://<VMS IP>/#/login/<tenant name>.Note
The domain name is case-insensitive.
Click Create. The tenant is created and appears in the list of tenants in the Tenants page.
Configure a Third Party Event Broker
Note
If you are going to use VAST Event Broker as the default event broker for DataEngine, you can skip this step.
Go to Settings -> Notifications -> Notification Kafka Brokers.
Click Add a New Kafka Broker and complete these fields:
Name
Enter a name for the event broker configuration.
Tenant
Select one tenant for which the event broker will be available, or leave the default of All tenants so that the broker is available for all tenants.
Host
Enter the bootstrap URL of the event broker server. You can specify an IP or FQDN.
If the Kafka cluster runs multiple event brokers, click +Add to add more hosts. You can add up to five hosts.
Port
Enter a port to communicate with the event broker server.
Tip
Ensure that the hosts are accessible from the VAST cluster's management interface at the specified ports.
Click Add Kafka Broker.
The newly created event broker configuration is added to the list of event brokers.
Provision a Bucket Owner User
Note
If you are going to configure an event broker external to the cluster as the default event broker for DataEngine, you can skip this step.
If you are going to use a VAST Event Broker (a view with the "Kafka" protocol enabled) as the default event broker that DataEngine will use to stream events, you need to provision at least one user with S3 access keys and permission to create buckets on the tenant. This is because when you are enabling DataEngine, you will create the VAST Event Broker and you will need to reference a user as a bucket owner for the view.
You will also need to reference a user of this type as a bucket owner when you provision S3 bucket views from which triggers can consume events, although this is not an essential step for enabling DataEngine.
In order to be able to grant multiple users to access DataEngine and to create triggers using the same VAST Event Broker, you also need to make sure this user belongs to a group that you will use for all non VMS-manager users that you want to allow to access VAST DataEngine on the tenant, once it's enabled. This is because you will need to give those users bucket listing permission for the VAST Event Broker view (this is done in the next step).
Make sure the provider you want to use is attached to the tenant. The provider can be a VAST provider, LDAP, or Active Directory. The provider should be configured on the cluster. For general information about managing providers, see Providers.
Note
If you use a VAST provider, enable the provider for cluster admin users and tenant admin users to manage and view, as needed. The default VAST provider is initially enabled for management by cluster admin users only.
To attach a provider to a tenant:
In the Tenants tab of the Element Store page, right-click the tenant and select Edit.
Under Providers and User Access, select the provider that you want to connect to the tenant.
Click Update.
Create a user on the provider. Grant the user bucket create and bucket delete permissions.
Tip
It is usually best practice that the user belongs to a group designated for granting DataEngine access. You will be able to grant the user's group bucket listing permission for a VAST Event Broker and you will be able to grant the same group access to DataEngine.
Tip
To create a user as part of a group on a VAST provider:
From the left navigation menu, select User Management and then, under Users and Groups, Local Groups.
On the Groups page that opens, click + Create Group.
In the Add Group dialog, complete the fields:
Field
Description
Name (required)
Enter a name for the group.
GID (required)
Enter a POSIX group ID (GID) for the group.
Local Provider (required)
Select the VAST provider with which the group will be associated.
Click Create. The group is created.
Switch to the Local Users tab.
Click Create User and complete the following fields:
Field
Description
Name (required)
The user name.
UID
The user's POSIX UID.
Local Provider (required)
Select the VAST provider with which the user will be associated.
Leading group
The name of the user's leading group.
This is the group assigned by default as the owning group of any files created by the user.
Select the group you created in the previous step from the dropdown.
Groups
Names of other groups that the user belongs to beside the leading group. Also known as auxiliary groups.
Select groups from the dropdown. If a group has not been added to the VAST provider, add the group first.
Select tenant to see user details
Select the tenant from the list. Tenants associated with the selected VAST provider (if any) are shown, as well as the default tenant.
Allow Create Bucket
Enable this setting. (Alternatively, you can grant this permission through an identity policy attached to either the group or the user.)
Allow Delete Bucket
Enable this setting. (Alternatively, you can grant this permission through an identity policy attached to either the group or the user.)
Do one of the following, as appropriate, to grant the user an S3 access key pair:
Edit the user you just created (if its a local user) and choose to generate an access key pair.
Query the user (this is applicable to a user on any provider) and generate an access key pair for the user:
In the Query User or Group tab of the User Management page, query the user on the relevant provider.
In the query result pane on the right, under Access Keys, click Add New Key.
Enable DataEngine
This procedure enables DataEngine on a tenant. The procedure includes selecting or creating an event broker and event broker topics, connecting the tenant to a container registry and connecting the tenant to a k8s cluster.
Enabling Data Engine from the VAST Web UI
From the left navigation menu, select Element Store and then Tenants.
Right-click the tenant and select Enable DataEngine.
The Enable DataEngine dialog opens.
In the Assign Kafka step screen, from the Default Broker dropdown, do one of the following:
To use a third party event broker, select the event broker from the dropdown.
To use a VAST Event Broker, select Add new broker.
Note
If the VAST Event Broker is already created, you can select it from the dropdown.
The Add View dialog opens. Some of the fields are pre-filled. Continue with the following steps to configure a VAST Event Broker:
Complete the following fields:
Path
Specify the path in the tenant's Element Store where the Event Broker should reside. The path must begin with a slash.
Create directory
Enable this.
S3 bucket name
Enter a name for the S3 bucket in which the VAST Event Broker will reside.
Bucket Owner
Provide the user you created in Provision a Bucket Owner User. Click to enter the name and start typing and then select a name from the list.
Authentication Methods
Set these as you require. For information about these settings, see Configuring User Authentication for the VAST Event Broker View.
Create or select a suitable view policy. The view policy needs to have S3 Native security flavor enabled. The policy should also grant S3 bucket listing permission to the user group that you are using to provision application users, although you could add the bucket listing permission at a later stage. Bucket listing permission will enable all group members to create triggers using the event broker you are creating.
To use an existing policy, select the policy from the Policy name dropdown.
To create a new view policy:
In the Policy name dropdown, select Add new Policy.
In the Add Policy dialog, fill the following fields:
Tenant
Select the relevant tenant.
Name
Enter a name for the policy.
Security Flavor
Select S3 Native.
Group membership source
Select any of the options. This is a required field for view policies, although all options are valid for this use case.
In the S3 section, in the Bucket listing permission (groups) field, enter the name of the user group that you are going to use to provision users' access to DataEngine.
Note
This step is not strictly mandatory for enabling DataEngine. It is typically needed, assuming you are using a VAST Event Broker and granting multiple users access to DataEngine. Giving all users bucket listing permission in this policy will enable the users to create triggers using a VAST Event Broker created with a group member as the bucket owner.
Click Create.
The policy is created and is selected for you in the Policy name field ofthe Add View dialog.
Create or select a virtual IP pool for the event broker. The virtual IP pool needs to include at least as many virtual IP addresses as there are CNodes on the cluster. It also needs to be dedicated exclusively to the VAST Event Broker.
To use an existing virtual IP pool, select the virtual IP pool from the VIP Pool dropdown.
To create a new virtual IP pool:
In the Kafka section, from the VIP Pool dropdown, select Add new VIP pool.
In the Add Virtual IP Pool dialog, fill the following fields:
VAST Web UI Field Name
Requirement
Tenant
Specify the tenant on which you are going to enable DataEngine.
Name
Give this a suitable name for identifying the virtual IP pool. This virtual IP pool must be used exclusively for the VAST Event Broker.
Role
Set this to Protocols.
Subnet CIDR IPv4/Subnet CIDR IPv6
Set this as needed for the IP address range that you supply.
IP Range List
Supply a range of IP addresses. Include at least as many IPs as there are CNodes on the cluster. This pool of IP addresses will be dedicated exclusively to the VAST Event Broker for the tenant.
Click Create.
The virtual IP pool is created and is selected for you in the VIP Pool field ofthe Add View dialog.
Click Create.
From the Default Topic dropdown, select Add new topic to create the default topic on the event broker. Triggers will be streamed to this topic by default and it will be available immediately for hosting triggers when DataEngine is enabled. In most cases, there is no need to create additional topics for hosting triggers. Assuming the default topic will be used for all or most triggers, it's advisable to set wide partitioning, probably in the region of 50-100 partitions.
In the Create Topic dialog, complete the following required fields:
Topic Name
Enter a name for the topic.
Number of partitions
Enter the number of partitions for the topic.
Each partition can hold up to 1000 topics. The number of partitions in a topic cannot be changed after the topic has been created.
The default topic will be available immediately for hosting triggers when DataEngine is enabled. Unless your use case specifically requires multiple topics, the default topic may be used for hosting all triggers. Therefore, it is usually advisable to set wide partitioning, probably in the region of 50-100 partitions.
Retention period
Specify the amount of time to keep an event record in the topic. When the retention period for a record expires, the record is deleted from the topic. The default retention period is seven days. The minimum allowed retention period is 6 hours.
Compaction
If this flag is set, VAST Cluster keeps only the latest record version for each key in the partition log. Any previous versions of the record are deleted.
Note
Compaction is performed asynchronously in the background. At some point in time, duplicate keys may exist.
Select this checkbox to enable compaction in the topic. With compaction, the topic preserves only the latest version of each message key in each partition.
Timestamp source
Select the source to use for the event time stamp:
Time event is created (producer). Sets the event timestamp based on the time when the event was encountered at the event producer.
Time event is logged (server). Sets the event timestamp based on the time when the event record was added to the log at the event broker.
Event time evaluation period
The maximum acceptable time between the event time and server time.
Infinity. For Time event is created (producer) timestamps, this option specifies that any difference between the producer time and the server time is acceptable, e.g. the message timestamp can be earlier or later than the broker timestamp for any amount of time.
Define Manually. For Time event is created (producer) timestamps, specifies the acceptable difference between the producer time and the server time:
Not before: Determines how much earlier the message timestamp can be than the broker timestamp. If this value is exceeded, the message is rejected.
Not after: Determines how much later the message timestamp can be than the broker timestamp. If this value is exceeded, the message is rejected.
Click Create.
In the Default Deadletter Topic dropdown, select Add new topic to create the deadletter topic, which is the topic to which events are routed in case of function failures, routing failures and dispatching failures. The deadletter topic will not be available to configure as the topic for any trigger event.
Click Add Certificate to upload a TLS certificate.
Paste a CA certificate into the box provided and click Upload Certificate.
Click Next.
In the Link Kubernetes Cluster step, complete these fields:
Name (required)
Enter a name for the Kubernetes cluster to which you want DataEngine to connect.
API Server URL (required)
Enter the Kubernetes API server endpoint URL.
For example: https://kube-server-57:6443
Description
Enter a description of the Kubernetes cluster. This is optional.
In the Certificates area, use the fields provided to upload the CA certificate, client certificate and private key used for authenticating over mTLS to the Kubernetes cluster. For each certificate, click +Add new , paste the certificate content, base64 encoded, and then click Save Client certificate /Save CA certificate/Save Private Key.
In the Tags area, enter any metadata tag that you want to tag the Kubernetes cluster resource with, and select Add Tag to add the tag. Repeat for additional tags. This is optional.
Click Next.
In the Select Namespaces step, the namespaces on the Kubernetes cluster are listed on the left under All possible Properties. Use the selection boxes and arrow buttons to choose which of the namespaces to allow the tenant's DataEngine to access and list them on the right under Selected Properties. Pipelines will be deployed in the selected namespace(s).
Click Next.
In the Link Container Registry step, complete the fields to connect the container registry:
Primary kubernetes cluster
Select the primary Kubernetes cluster from the dropdown.
Additional kubernetes clusters
Select any additional Kubernetes clusters that are configured on the tenant if you want them to be able to deploy any function that has its image stored on the container registry.
Name
Enter a name for the container registry.
Base URL
Enter the base URL of the container registry.
Description
Enter a description for the container registry. (Optional.)
Authentication Method
Select the authentication method required by the container registry for the DataEngine to authenticate to the container registry:
User credentials. Choose this option if a user name and password is required. In the Username and Password fields, enter valid user credentials for authenticating to the container registry.
Kubernetes secret. Choose this option if a stable authentication secret is required. In the Kubernetes Secret Name field, enter the name of an existing Kubernetes Secret that contains the credentials for authenticating to the container registry .
None. Choose this option if no authentication is required to connect to the container registry.
Tags
Optionally, you can use this section to tag the container registry with any meaningful text strings for ease of organization and identification.
To add a tag, enter a string in the Tag field and then click Add Tag.
The tag is added to the Tags list.
If you want to remove a tag that you added, click the tag's removal icon.
Click Finish.