VAST Database Row and Column Security

The VAST Database supports row and column security features to restrict access to specific rows or columns in a table, based on a user's identity.

Note
This feature is supported only with Trino Query Engines.

These features are applied using statements in Identity or Bucket Policies. These statements determine access to specific database resources (rows or columns) according to the user identity.

These features are supported:

Column allow/deny. You can allow or deny access to specific columns in queries, insertions, and renames on a table, based on the user identity.
Column masking. You can redact contents of specific columns in queries on tables, based on the user identity. This allows sensitive information to be redacted, while not failing the query.
Row filtering. You can allow or deny access to rows in a table that match a filter, based on the user identity.
Row filtering based on VAST authorization. You can apply a filter based on the VAST S3 user permission model. This uses reserved columns in the database to regulate user access to resources in the database.
For database access, this uses a column in the database labelled vastdb_s3_path_auth (reserved name), The cells in this column contain lists of comma separated S3 paths in the form "bucket_name1/object_name1", "bucket_name2/object_name2". The filter checks each of these paths, and the user's effective permissions to access them, based on applicable identity and bucket policies for the paths. Rows in the path that have permission for the get-object action are returned by the filter; rows without are not returned.
For DataEngine access, it filters access to internal/reserved tables in the database that control the access to pipelines and pipeline logs. This is used to regulate user access to pipelines and pipeline logs.

The Identity Policy includes a statement like this for row and column security features:

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect": "RowColumnSecurity",

         ("Resource" | "NotResource"): ("*" | [<resource1>, <resource2>, ...])

         ("ColumnAllow"|"ColumnDeny"): ("*" | ["<column_name1>", "<column_name2>", ...]),

         "RowFilter": [{"QueryEngine":("*"|[<query_engine1>, <query_engine2>, ...]),
                        "FilterString": <row_filter_string1>},
                       {"QueryEngine":("*"|[<query_engine1>, <query_engine2>, ...]),
                        "FilterString": <row_filter_string2>}, ...]

         "ColumnMask": [{"QueryEngine":("*"|[<query_engine1>, <query_engine2>, ...]),
                         "ColumnName": <column_name1>,
                         "MaskString": <function_string1>},
                        {"QueryEngine":("*"|[<query_engine1>, <query_engine2>, ...]),
                         "ColumnName": <column_name2>,
                         "MaskString": <function_string2>}, ...]

          "RowFilterByS3PathAuth": ("True"|"False"),
      },
      ...
]
}

where Resource specifies the database table or view, or a path.

The sections below explain the use of each statement element.

Row Filtering

With row filtering, RowFilter in the policy statement filters rows in query results.

For example, the statement:

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect": "RowColumnSecurity",
         "Resource": ["my_bucket/my_schema/*"],      
         "RowFilter": {"QueryEngine":"Trino", "FilterString": "COL1='abc'"}   
      }  
]
}

This statement behaves like an additional WHERE clause in the query. In the above case, WHERE COL1='abc'. The effect is to filter results of queries to tables in my_bucket/my_schema, when COL1 has value 'abc'.

Column Allow/Deny

With ColumnAllow or ColumnDeny, you can list columns that are returned or filtered out of query requests on tables.

For example, this statement:

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect": "RowColumnSecurity"
         "Resource":  ["my_bucket/my_schema/*"],
         "ColumnDeny": ["my_col1", "my_col2"] 
      },
      ...
}

This statement filters out columns my_col1 and my_col2 from queries to tables in my_bucket/my_schema. All other columns are returned.

If ColumnAllow is included, all other columns are excluded from the results. If ColumnDeny is included, all other columns in the results are included. If both ColumnAllow and ColumnDeny are included in the statement, all columns are excluded except those explicitly allowed (by ColumnAllow).

Columns must be top-level columns.

Column Masking

ColumnMask replaces or redacts contents of specific columns returned by the query, according to a regexp (for example, using regexp_replace).

For example, this statement:

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect": "RowColumnSecurity"
         "Resource":  ["my_bucket/my_schema/*"],
         "ColumnMask": [{"QueryEngine":["Trino", "Spark"],
                         "ColumnName": "my_col_email",
                         "MaskString": "regexp_replace(my_col_email, '.*', '***')"}]
      },
      ...
]
}

This statement replaces the contents of the my_col_email with '***' in the query results.

This example replaces all columns of type Int with '123':

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect": "RowColumnSecurity"
         "Resource":  ["my_bucket/my_schema/*"],
         "ColumnMask": [{"QueryEngine":["Trino", "Spark"],
                         "ColumnName": "my_col_email",
                         "MaskString": "CAST('123' AS INT)"}]
      },
      ...
]
}

Row Filtering Based on VAST Authorization

This is enabled with this statement in the identity policy:

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "RowFilterByS3PathAuth": ("True"|"False"),
      },
      ...
]
}

When enabled, the filter checks entries in the column vastdb_s3_path_auth. This column optionally contains comma-separated lists of paths of the form "bucket_name/object_name", .... If a row contains such a value in this column, the paths listed in the cell are checked to see if the user has permissions for the S3 get-object action on them, after all applicable Identity and Bucket Policies are applied. If the user has this permission for all paths listed in the cell, the row is returned, otherwise not.

For example, consider a table with these values:

+----------------+-----------+------------------------------------------------+
| COL1           | COL2      | vastdb_s3_path_auth                            |
+----------------+-----------+------------------------------------------------+
| ...            | ...       | "my_bucket/path1"                              |
+----------------+-----------+------------------------------------------------+
| ...            | ...       | "my_bucket/path2"                              |
+----------------+-----------+------------------------------------------------+
| ...            | ...       | "my_bucket/path1","my_bucket/path2"            |
+----------------+-----------+------------------------------------------------+
| ...            | ...       | "my_bucket/path1/path3"                        |
+----------------+-----------+------------------------------------------------+

The user has get_object permissions for path1,

A query on this table with the feature enabled would return only the first row. The second and third rows are not returned since the user lacks this permission on path2, and the fourth row is not returned as the user lacks the permission on /path1/path3.

In order to use this feature, you must create this column in the table. The column can only be added to the table when the table is created, and once created, cannot be changed or deleted. Cells can be null, but inserts or updates are validated to check the values are valid pathnames.

Enabling Row and Column Security

In order to use row and column security statements in policies, end user impersonation must be enabled for tabular operations on the database by the query engine. This means the query engine passes the details of the original end user making the query, when it applies a query to the database. This allows identity policies to be applied according to the actual user submitting the query (and not according to intermediate credentials of the query engine when it applies the query).

Row and column security, and user impersonation. are applied as statements in the Identity Policy, like this:

{
  "Version": "2012-10-17",
  "Statement": [
     {
      "Sid": "Get row and column security configuration for Tabular operations",
      "Effect": "Allow",
      "Action": "s3:TabularGetRowColumnSecurity",
      "Resource":  ["arn:aws:s3:::my_bucket/my_schema/*"]
    },
    {
      "Sid": "Allow end user impersonation on Tabular operations",
      "Effect": "Allow",
      "Action": "s3:TabularEndUserImpersonation",
      "Resource":  ["arn:aws:s3:::*"]
    },
   ...
  ]
}

You can accomplish the same by adding these Custom statements to policies: Database/Advanced Security/EndUserImpersonation, and TabularGetRowColumnSecurity.