Overview
You can run Spark and Trino applications on CNodes in your VAST Cluster. These applications run on designated CNodes, and share compute resources on these CNodes between VAST storage and the applications.
The applications use VAST connectors to access a VAST DataBase.
Pre-requisites
The following are are required to create managed applications on CNodes that access VAST Databases.
A VMS user with permissions to create and access S3 databases on the VAST Cluster. The user should have an identity policy that includes
Allowpermissions for S3 tabular databases. For example, this policy statement, which allows all S3 actions on all S3 resources:{ "Version": "2012-10-17", "Statement": [ { "Sid": "mySid", "Effect": "Allow", "Action": "s3:*", "Resource": "*" } ] }The user should also have an Access key and Secret key. These keys are used to configure the managed database applications.
Viewing Running Applications
The Applications tab in the Data Engine page of the VAST Web UI shows the applications that are running on application designated CNodes.
Creating Applications on CNodes
You can create and run applications on CNodes. Each applications one or more CNodes on which to run, which are allocated when the application is created. You can create more than one application if there sufficient CNodes.
On the VAST Web UI navigate to the Data Engine page, and select the Application tab.
Click Create Application.
Enter details for the application In the General section.
App Name
Enter a name for the application as it will appear in the list.
Application
Select the application to run, from the list, either Trino or Spark.
Image Tag
Select the application image from the list -
For Spark:
Spark 3.4. An unmodified Spark engine, with the VAST connector.
Spark 3.5.1. A version of Spark that includes the bundled Spark extensions, and also includes the option to run Spark Thrift and Connect as part of the Spark cluster.
For Trino:
Trino 475
In the Resource Selection section, select the CNodes on which the application will run from the All Possible CNodes box (on the right) , and move them to the Selected CNodes box (on the left). The selected CNodes are the application CNodes. You need at least two CNodes; if Spark 3.5.1 is selected, at least three CNodes are required.
Note
It is not recommended to select all CNodes for Applications as this could leave insufficient resources for other activities on the cluster.
In the Resources Limitation section, optionally set limits on the CNode resources the application can use. In the Use up to field, set the maximum percentage CNode resources the app can use (between 20% and 60%, in increments of 5%). The default limit is sufficient for most cases.
In the Network section enter details for the Spark Master and Worker nodes or Trino Coordinator and Worker nodes, depending on which application was selected. Each node requires a Virtual IP (VIP) address. Select these from the Virtual IP pools allocated for the Cluster (in the Virtual IP tab of the Network Access page).
Complete network details for the application nodes:
For the Master or Coordinator node, enter a VIrtual IP address.
Virtual IP
The Virtual IP address of the Spark Master or Trino Coordinator node. It should be allocated for any other use in the cluster.
This address is also used by the History, Connect, and Thrift servers for Spark 3.5.1 with Thrift Connect.
Note
When the application is created, a VIP Pool is created for it automatically.
For the Worker nodes (for both Spark and Trino), enter a list of Virtual IP addresses.
Virtual IP
A list or range of Virtual IP addresses for the Worker nodes. There should a route between these Virtual IPs and the Master/Coordinator.
Optionally, set advanced network details:
Netmask
The subnet mask of the Virtual IP assigned to the application nodes.
Gateway IP
The IP address of the gateway of the Virtual IP assigned to the application nodes.
VLAN
If you want to tag the Virtual IP pool with a specific VLAN on the data network, enter the VLAN number (0-4096). See Tagging Virtual IP Pools with VLANs. This VLAN is intended for external client apps to connect to the Master/Coordinator nodes.
In the Configuration & Security section add configuration files for the application, according to the application selected.
For Spark (only if Spark 3.5.1 was selected above as the Image Tag):
Configuration File
Description
spark-defaults.conf
The main Spark configuration file, enables:
Spark Thrift and Connect servers to operate with the VAST Database
TLS
It also contains configuration details to access VAST Databases, including an Access and Secret key (described in the Pre-requisites section).
core-site.xml
Configures LDAP and LDAPS providers
hive-site.xml
Configuration for Hive, including LDAP and LDAPS
hdfs-site.xml
Provides default behaviors for an HDFS client.
For Trino:
configuration.yaml
The main Trino configuration file.
You can also download these files, and template files with examples of how to enable and configure these features.
To add a certificate for the application, upload a certificate and key in the Certificate and keys section.
Note
You can use the same certificate for both applications.
Click Create. The application is created on the selected CNodes. The application images are loaded on the selected CNodes, and then started. This can take some time.
Monitor progress on the Activities page of the VAST Web UI: events appear indicating that the application creation has completed (event name: create_managed_application). When this process is complete, the application appears in the list of applications in the Applications page.
Initializing and Starting Managed Applications on the CNode
After the application is created and deployed on the CNode, it must be initialized and started.
On the Data Engine page, select the Applications tab.
Select the application in the list. The status for the applications is INIT after it is created.
Right-click on the application, and click Start. The status for the application changes to RUN as it starts to run. Right-click on the application and click View CNode State, to monitor the status of the application.
If an error occurs when starting the application (the status is not RUN), right-click on the application and click Retry. This will attempt to start the application again.
Stopping Applications
On the VAST Web UI navigate to the Data Engine page, and select the Application tab. Right-click on the application and click Stop.
Restarting Applications
On the VAST Web UI navigate to the Data Engine page, and select the Application tab. Right-click on the application and click Restart.
Updating Application Configurations
You can change the configuration for a running managed applications. You do this by modifying the configuration files, uploading them again, and then restarting the application.
Right-click on the application, and click Edit Configuration & Security. The configuration and certificate files for the application are shown, if they were added when the application was created.
Click
next to a file to download it.Make changes to the downloaded file, as necessary.
Click
to delete from the application the current configuration files that were downloaded. Click Add to upload the modified files.
Right-click on the application, and click Restart. The application is stopped and restarted, using the modified configuration files.
High-Availability Operability Issues
At least two CNodes must be selected to host applications, to allow for continued operation in the event one CNode fails.
If the CNode running the application fails, the application will be started on another CNode designated for the application.