Sorted Tables

Prev Next

Tables in VAST Databases can be sorted, to improve performance when querying for specific rows. Sorting is done using selected columns in the table which you designate as sort columns. The table is sorted in the background according to these columns. The table is accessible for insert and query operations at all times, though performance could be affected by sorting proceeding in the background.

Table sorting can be applied to new or existing tables in a VAST Database. Once applied, the settings for a table cannot be changed, and the values in the sort columns cannot be modified.

Configuring Sorted Tables

Configure sorted tables separately on each table in a VAST Database.

Configuring Tables Sorting using the VAST Web UI

Select up to four columns in a table as sorting columns. You can select them for new tables (without any rows) or existing tables. Once selected, the columns, and the sorting order, cannot be changed for the table. The table is continuously sorted as updates are made to it.

The table is sorted according to the sorting columns, in the order these columns are selected.

  1. Navigate to the DataBase page and select the database.

  2. Select the table to be sorted, then select options-symbol.png (in the detail pane for the table, on the right), and select Enable Table Sorting.

  3. Select up to four columns in the table from the Available Columns list. The selected columns are then moved to the Selected Columns list.

  4. Tables are sorted using these selected columns, starting from the first column, and then using the other columns, in order, if necessary. Drag columns in the list to change the order, then click Save.

Configuring Table Sorting using the VAST CLI

Use the table modify VAST CLI command as follows to configure it for sorting:

admin> vcli table modify --database-name bucket_name --schema-name schema_name --name table_name --sorted-column-names column1_name,column2_nam

This sets columns column1_name and column2_name as sorting columns for the table and enables sorting on the table.

Configuring Sorted Tables using Query Engines

You can configure sorted tables using these query engines.

Python SDK

Note

Use VAST DB Python SDK version 1.3.9 or later for sorted tables.

On the VAST Python SDK, set sorting columns in the following way:

# connect to the database
session = vastdb.connect(endpoint=endpoint, access=s3_access, secret=s3_secret)

# with a transaction, lookup the table and add a sorting key
# using the 0-based column index
with session.transaction() as tx:
    table = tx.bucket(bucket_name).schema(schema_name).table(table_name)
    table.add_sorting_key([0, 1])  # sort key on the 1st and 2nd columns

This sets columns 1 and 2 as sorting columns.

Trino

Note

Use the VAST Trino Connector version 5.4 or later. Download from https://github.com/vast-data/vast-db-connectors/releases.

On Trino, set sorting in the following way:

ALTER TABLE vast."database_name/schema_name".table_name
    SET PROPERTIES sorted_by = ARRAY['column1_name', 'column2_name'];

This sets columns column1_name and column2_name as sorting columns for the table.

Spark

Table sorting cannot currently be enabled using Spark. To enable sorting on a table, use one of the methods described above (using the VAST Web UI, the VAST CLI, or the VAST Python SDK).

Sorting Status of Tables

The VAST Web UI shows an indication of the table sorting status for each table that is configured for sorting. The status is shown as a grade, on the right of the detail pane for the table. It indicates the degree to which the table is sorted. As sorting progresses (for example, if applied to an existing table, or after additional rows are inserted), the score improves.

Grades are shown as C  (not sorted), B (partially sorted), A (sorted), or A+ (super sorted).

When rows are added or changed in the table, the grade will go down as the table is resorted, and then return to a final grade when sorting is complete.

Table Sorting Guidelines

These guidelines apply to sorted tables in VAST Databases:

  • Sorting cannot be disabled on a table once it is enabled.

  • Sorting columns cannot be updated.

  • The sorting columns and their order cannot be changed once defined.

  • Sorting tables cannot be replicated (that is, they cannot be included in a protected path that is replicated). Similarly, sorting cannot be enabled for a table that is replicated.

  • Table projections cannot be added to sorted tables, and tables with projections cannot be sorted.

  • Tables that expose row_id cannot be sorted. Similarly, a table that is sorted cannot expose this column (the two features are mutually exclusive).

  • Sorting uses the first eight bytes of a sorting column. So, string-based timestamp fields of the form YYYY-MM-... will not work well as sorting columns. Use date or timestamp data types instead.

  • Table sorting works best on large tables (more than 1m rows). Tables are sorted into chunks of 512K rows, which are not effectively sorted for smaller tables.

  • During sorting, the number of rows in the table, as well as the size of the table, may increase and then decrease back to the original size, as sorting progresses and then completes. These additional rows are used in the sorting. The number of rows reported in a query (select count(*) from table) remains unchanged and correct throughout the sorting process.

  • It is a recommended best practice to do inserts on sorted tables in batches as large as possible (rather than numerous smaller insert operations). This improves performance for both insert and queries on the table.