Importing Data to VAST Database

Prev Next

Use one of the following methods to fill in your VAST database with data:

  • Run a CTAS query from your query engine's client.

  • Insert data directly into a VAST database table.

  • Import data from Parquet files.

Running CTAS Queries

Using your query engine's client, connect to the data source where the data reside and run a command to make a CREATE TABLE AS SELECT (CTAS) query. A CTAS query creates a copy of the source table in the VAST database.

The syntax would be similar to the following:

CREATE TABLE <VAST database table> AS SELECT * FROM <data source table>

Inserting Data into a VAST Database Table

To insert data directly into a VAST database table:

  1. Create a VAST database table to which to insert the data using VAST Web UI or VAST CLI.

    • In  VAST Web UI, choose DataBase -> VAST DB, select a database and a schema in the database tree, and click the + Add Table button. Complete the fields in the dialog that opens and click Create.

      Note

      For a complete procedure, see Creating a Table via VAST Web UI.

  2. Run an INSERT command from your query engine's client.

    The syntax is like this:

    INSERT INTO <VAST database table> SELECT * FROM <data source table>

Importing Files into a VAST Database

You can insert data into a VAST Database from Parquet or CSV files using the VAST Web UI, using third-party query engines connected to a VAST Database, such as Trino or Spark, or using the VAST DB SDK.

You can import data from files into existing tables in the database, or create a new table based on the column structure of the file.

Files imported using the VAST Web UI are limited to 100MB.

You can import data columns from a Parquet file to a VAST Database table that already contains rows with keys. The import requires that the keys in the Parquet file and the keys in the table are ordered in the same order. If there are missing or excessive keys, the import request is rejected.

Importing Files from the VAST Web UI

You can insert data into a VAST Database from Parquet or CSV files using the VAST Web UI.

Importing data from a file into an existing table in a VAST Database
  1. In  VAST Web UI, select VAST DataBase , then select the database and a schema in the database tree,

  2. Select the table in the schema, and then click Upload File. Navigate to the file on your computer, and then click Upload.

    The file is uploaded to the selected table. The table must have all the columns in the file, or the upload will fail.

Importing data from a file into a new table in a VAST Database
  1. In  VAST Web UI, select VAST DataBase , then select the database in the database tree,

  2. Click Create Table From File.

  3. Enter the name of the new table.

  4. Navigate to the file on your computer, and then click Create.

    A new table is created in the database schema with the contents of the file.

Importing Parquet Files using the VAST Database Python SDK

You can import files into a VAST Database using the VAST Database Python SDK. This SDK has classes to import from files to existing tables, or to create new tables.

Importing Parquet Files with Trino

You can fill in the VAST database with data from Parquet files contained in a VAST-stored S3 bucket, using the Trino client. The data is imported directly from the S3 bucket to the database table(s), keeping Trino out of the data path.

Tip

Before importing the data, ensure that the VAST database owner user has valid S3 access keys that provide access to the S3 bucket with the Parquet files.

Use the following command on the Trino client to insert partitioned parquet files into a VAST Database:

trino> insert into vast."db-bucket/myschema"."mytable vast.import_data()" (country, city, "$parquet_file_path") VALUES ('New York', 'New York City', '/db-bucket/myparquet'), ('New York', 'Manhattan','/db-bucket/myparquet2');

where db-bucket, myschema, and mytable are replaced with the database, schema, and table names.

This example imports parquet files without partitions:

trino> INSERT INTO vast."db-bucket/myschema"."mytable vast.import_data" ("$parquet_file_path") VALUES ('path/to/parquet/file')

Importing Parquet Files with Spark

Use the following command to import data from partitioned parquet files to a VAST database using Spark.

spark-sql > insert into ndb.`db-bucket`.myschema.`mytable vast.import_data(country, city)` (country, city, `$parquet_file_path`) VALUES ('New York', 'New York City', '/db-bucket/myparquet'), ('New York', 'Manhattan','/db-bucket/myparquet2');

where db-bucket, myschema, and mytable are replaced with the database, schema, and table names.

This example imports parquet files without partitions:

spark-sql> INSERT INTO ndb.`db-bucket`.myschema.`mytable vast.import_data()` (`$parquet_file_path`) VALUES ('path/to/parquet/file')

Avoiding Duplicate Imports to a VAST Database

New databases validate file imports to avoid duplicate imports. A Database file import fails if the file was already imported (based on the file name). If for some reason you want to disable this validation (in order to permit duplicate imports), contact VAST Support for assistance.