Trino provides a very powerful and flexible way to connect to the VAST Catalog. Trino enables querying the catalog from the Trino Python client, Querybook, Grafana, and other software that utilize Trino as a data source. VAST has published Docker containers with everything you need to connect to your VAST cluster.
Prerequisites
This guide assumes that you have:
Enabled the catalog
Have an identity policy granting access to it
Have a user associated with the identity policy
The user has S3 credentials
A Linux system with Docker installed and a user with permissions to start containers.
Refer to “Providing Client Access to VAST Catalog CLI” in the VAST Support documentation for details on the first four items.
Configure & start Trino container
Create a vast.properties with these contents, update with the correct endpoint and access/secret keys:
## Update these fields ##
# "endpoint" should be a load-balanced DNS entry or one of the VIPs prefixed by "http://"
# it should not contain a trailing / or anything else.
endpoint=http://x.x.x.x
# "data_endpoints" should be be a load-balanced DNS entry or one or more of the VIPs
# prefixed by "http://" it should not contain a trailing / or anything else.
# Multiple VIPs can be used with commas between them, eg: http://x.x.x.x,http://y.y.y.y
data_endpoints=http://x.x.x.x
# Access and secret keys -- make sure the user was added to an identity policy
# granting them access to the catalog.
access_key_id=xxx
secret_access_key=xxx
## Don't change these fields ##
connector.name=vast
region=us-east-1
num_of_splits=32
num_of_subsplits=8
vast.http-client.request-timeout=60m
vast.http-client.idle-timeout=60m
enable_custom_schema_separator=true
custom_schema_separator=|Start the VAST-provided Trino Docker container – they already contain the connector and the correct jvm.config updates.
ℹ️ Info
The version of the Trino container depends on your cluster version. See the full docs (Installing and Configuring the VAST Connector for Trino) for details.
In short:
4.7 - use
vastdataorg/trino-vast:3755.0 - use
vastdataorg/trino-vast:4205.1 - use
vastdataorg/trino-vast:429
docker run \
--name trino \
-p 8080:8080 -d \
-v ./vast.properties:/etc/trino/catalog/vast.properties:ro \
vastdataorg/trino-vast:429Test with the Trino client
Start the client from within the Trino container:
docker exec -it trino trinoNow you can execute queries against the server – you must start with the use command to set the context:
use vast."vast-big-catalog-bucket|vast_big_catalog_schema";
show columns from vast_big_catalog_table;
select * from vast_big_catalog_table limit 1;Optionally install Java, download the client, and use the client directly:
sudo yum install java-openjdk
wget https://repo1.maven.org/maven2/io/trino/trino-cli/429/trino-cli-429-executable.jar
java -jar trino-cli-429-executable.jarConnect with the trino Python library
You can also query the catalog with the trino python library, eg:
from trino.dbapi import connect
TRINO_HOST = "localhost"
TRINO_PORT = 8080
# establish our connection to Trino
conn = connect(
host=TRINO_HOST,
port=TRINO_PORT,
user="vast",
catalog="vast",
schema="vast-big-catalog-bucket/vast_big_catalog_schema",
)
# find all the files to tag
cur = conn.cursor()
cur.execute(
"""
SELECT *
FROM vast_big_catalog_table
LIMIT 1
"""
)
for row in cur.fetchall():
print(row)Useful starter queries
Describe the table to see all the columns available:
describe vast."vast-big-catalog-bucket/vast_big_catalog_schema".vast_big_catalog_tableExample output:
Column | Type | Extra |
---------------------+-----------------------------------------+-------+
phandle | row(clone_id integer, handle_id bigint) | |
creation_time | timestamp(9) | |
uid | integer | |
owner_sid | varchar | |
owner_name | varchar | |
gid | integer | |
group_owner_sid | varchar | |
group_owner_name | varchar | |
atime | timestamp(9) | |
mtime | timestamp(9) | |
ctime | timestamp(9) | |
nlinks | bigint | |
element_type | varchar | |
size | bigint | |
used | bigint | |
tenant_id | integer | |
name | varchar | |
extension | varchar | |
parent_path | varchar | |
symlink_path | varchar | |
major_device | integer | |
minor_device | integer | |
s3_locks_retention | row(mode tinyint, timeout time(6)) | |
nfs_mode_bits | integer | |
name_aces_exist | boolean | |
s3_locks_legal_hold | boolean | |
user_tags_count | smallint | |
user_metadata | map(varchar, varchar) | |
user_tags | map(varchar, varchar) | |
login_name | varchar | |
search_path | varchar | |Get the largest 10 files on the system last modified in the past 3 months:
select parent_path, name, size, mtime
from vast."vast-big-catalog-bucket/vast_big_catalog_schema".vast_big_catalog_table
where mtime > (current_timestamp - interval '3' month)
order by size
desc limit 10Get the largest 10 files on the system created more than 6 months ago:
select parent_path, name, size, ctime
from vast."vast-big-catalog-bucket/vast_big_catalog_schema".vast_big_catalog_table
where ctime < (current_timestamp - interval '6' month)
order by size
desc limit 10Show how much space is being used by the user, sorted by the space used:
select owner_name, sum(size) as size
from vast."vast-big-catalog-bucket/vast_big_catalog_schema".vast_big_catalog_table
group by owner_name
order by size desc
limit 10Show the 10 directories taking up the most space:
select parent_path, sum(size) as size
from vast."vast-big-catalog-bucket/vast_big_catalog_schema".vast_big_catalog_table
group by parent_path
order by size desc
limit 10Query for a specific tag (below as tagname) by value (below as tagvalue):
select parent_path, name, size
from vast."vast-big-catalog-bucket/vast_big_catalog_schema".vast_big_catalog_table
where user_tags_count > 0 and contains(map_keys(user_tags), 'tagname') and user_tags['tagname'] = 'tagvalue'
order by size desc
limit 10Stop the Trino server
To stop the Trino container after you’re done:
docker stop trino
docker rm trinoTroubleshooting
If you encounter the error Vast Server Error: Failed starting transaction, something is wrong with your access keys, or the user with those access keys has not been granted access to the catalog.