VAST on Cloud clusters in GCP are created using Terraform. Using Terraform files supplied by VAST, resources are created in a GCP project and then the cluster is installed on them.
Prerequisites
A GCP account with a GCP project, into which the Vast on Cloud Cluster will be deployed.
Terraform v1.5.4 or later
Google gcloud SDK
An SSH key pair
Configuring GCP for VoC
Configure the following in the GCP project, from the GCP Console.
Enable Google Cloud APIs
Enable these Google APIs:
Compute API, In Compute/VM Instances
Cloud Functions API. in Cloud Functions API
Cloud Build API, in Cloud Build
Secret Manager API, in Security/Secret Manager API
Optionally, these APIs are used in many use-cases, and are recommended:
Artifact Registry API
Compute Engine API
Network Management API
Service Networking API
Network Security API
Cloud Monitoring API
Cloud Logging API
Set up Private Networking
In the VPC Networks page, configure Private services access to your VPC by Allocating IP Ranges for Services and Creating Private Connections to Service.
Set up NAT per Region
In Network Services/Cloud NAT, create a Cloud NAT Gateway with these details, for each region that has a VoC cluster:
Region: the region containing the cluster
Router: Create New Router
Network Tier Service: Premium
Configure Firewall Rules
In Network Security/FIrewall, configure the firewall policies as follows.
Create a firewall rule for cluster traffic with these details:
Direction: ingress
Action on match : allow
Target tags: add voc-internal (this tag is used by the VoC cluster)
Source tags: add voc-internal
Protocols and ports: TCP on ports 22, 80, 111, 389, 443, 445, 636, 2049, 3128, 3268, 3269, 4000, 4001, 4100, 4101, 4200, 4201, 4420, 4520, 5000, 5200, 5201, 5551, 6000, 6001, 6126, 7000, 7001, 7100, 7101, 8000, 9090, 9092, 20048, 20106, 20107, 20108, 49001, 4900; and Other: ICMP
Leave all other rule settings as the defaults.
Create another firewall rule for health checks with these details:
Direction: ingress
Action on match : allow
Target tags: add voc-health-check (this tag is created when the VoC cluster is deployed)
Source: type IPv4, ranges 130.211.0.0/22 and 35.191.0.0/16 (these are set by Google for health checks).
Protocols and ports: TCP on port 22
Leave all other rule settings as the defaults.
Quotas and Policy Constraints
Your GCP project should have these quotas:
Quota for Local SSD , This is set per region, and must allow for at least 9TB for local SSDs per DNode, The default quota is sufficient for only three DNodes.
Note
Increasing the default quota to a sufficient level for a VAST cluster deployment can take some time, and is not done instantly using the GCP Console UI.
Quota for n2 and n2d CPUs. These are set per region. A CNode requires 32 n2d CPUs. For a 100 CNode cluster, 3200 CPUs would be required. A DNode requires 16 n2 CPUs. For a 100 DNode cluster, 1600 CPUs would be required.
Quota for Static Routes per VPC Network. This is set per VPC network, This should allow for any IPs you use to connect to the cluster.
Quota for static routes per peering group. This is set per peering group (for all peered projects). Peered groups contain VPCs within the a common project, that can be connected. These connections require static routes. The quota should allow for all the connection routes between VPCs in the peering group.
To avoid problems when creating the cluster in GCP, organizational level policy constraints should not conflict with cluster requirements. For example, policies that restrict creation of n2 VMs.
Installing the gcloud SDK
Download the Google Cloud CLI from https://cloud.google.com/sdk/docs/install to your client machine, and follow the instructions to install it.
Installing Terraform
Download the latest version of Terraform from here and follow the installation instructions in it.
Checking the GCP Configuration
Optionally, download and run this script to check the GCP configuration. This uses Terraform to deploy a test cluster, and tests connectivity to and within the cluster.
Download and extract the file https://vast-on-cloud.s3.eu-west-1.amazonaws.com/public_assets/voc-gcp-checker-1.0.4.zip into a folder.
Create and configure a Terraform variables file with these details:
## required variables zone = "<zone>" subnetwork = "<subnetwork>" project_id = "<project_id>" ## variables with defaults, when not provided, these defaults will be used # onprem_ip = "" # network_tags = []
where zone, subnetwork, and project_id are from the GCP Project.
Optionally, set these variables:
Variable
Description
network_tags
The network tags used with the cluster
onprem_ip
An on-prem IP address. If supplied, connectivity from the cluster to this address is tested .
Run this command:
~/voc/gcp/gcp-checker > terraform init
Run this command:
~/voc/gcp/gcp-checker > terraform apply
The deploy on Terraform starts, and takes about 5 minutes to complete. When done, this output is shown:
Apply complete! Resources: 5 added, 0 changed, 0 destroyed. Outputs: full_name = "checker-e516c43050c7ff71d929d4ab02c900d8" private_ip = "10.120.7.14" project_id = "voc-test" serial_console = "https://console.cloud.google.com/compute/instancesDetail/zones/us-central1-b/instances/checker-e516c43050c7ff71d929d4ab02c900d8-instance-8z5h/console?port=4&project=voc-test" subnetwork = "default" zone = "us-central1-b"
If firewall issues are discovered, they are shown, as in this example:
│ Error: Resource precondition failed │ │ on data.tf line 14, in data "google_compute_instance" "checker_instance": │ 14: condition = var.disable_instances_output || length(local.created_instances) == local.nodes_count │ ├──────────────── │ │ local.created_instances is empty set of string │ │ local.nodes_count is 1 │ │ var.disable_instances_output is false │ │ Only 0/1 instances where created, please retry again later, or run with 'TF_VAR_disable_instances_output=true' to ignore. ╵ ╷ │ Error: Resource postcondition failed │ │ on data.tf line 25, in data "google_compute_instance_group_manager" "checker_instance_group_manager": │ 25: condition = alltrue(self.status[*].is_stable) │ ├──────────────── │ │ self.status is list of object with 1 element │ │ Health check did not become stable after 5 minutes, this usually means that the firewall was not configured properly for health checks. │ The voc cluster needs to have a firewall rule that allows traffic from the health check IPs ranges (130.211.0.0/22, 35.191.0.0/16) to port 22 on the instances. │ This is usually done by setting such a rule for the default voc 'voc-health-check' network tag. but can also be done by setting another network tag │ and passing that tag to the checker (and to the voc cluster creation script later on), as one of the tags in the optional "network_tags" variable.
Click on the link for serial_console in the output, to see details of the results. The results appear in the Google console in the serial port output. The output there appears like this:
Waiting for up to 10 minutes for base connectivity before starting connectivity checks: connectivity to meta-data service (http://metadata.google.internal/computeMetadata/v1/instance/zone) ...... connectivity to meta-data service (http://metadata.google.internal/computeMetadata/v1/instance/zone) ok connectivity to Compute Engine API (compute.googleapis.com) ...... connectivity to Compute Engine API (compute.googleapis.com) ok connectivity to internal cluster instance (ping) (10.120.7.5) ...... connectivity to internal cluster instance (ping) (10.120.7.5) ok connectivity to internal cluster instance (port 22) (10.120.7.5) ...... connectivity to internal cluster instance (port 22) (10.120.7.5) ok connectivity to internal cluster instance (port 80) (10.120.7.5) ...... connectivity to internal cluster instance (port 80) (10.120.7.5) ok ... Connectivity check completed successfully
Configuring the VAST GCP Cluster in Terraform
You will receive a zip file from VAST that contains Terraform files that are used to create the VAST Cluster.
Extract the contents of the file into a folder. If you are creating more than one cluster, extract the contents of each zip file into a separate folder.
Create a file voc.auto.tfvars (use the file example.tfvars, from the zip file, as an example) with this content:
## required variables
name = "<name>"
zone = "<zone>"
subnetwork = "<subnetwork>"
project_id = "<project_id>"
nodes_count = 8 # Minimum 8 - Maximum 14
ssh_public_key = "<public ssh key content>"
customer_name = "<customer_name>"
## variables with defaults, when not provided, these defaults will be used
# network_tags = []
# labels = {}
# ignore_nfs_permissions = false
# enable_similarity = false
# enable_callhome = false
where name is the name of the cluster, zone, subnetwork, and project_id are from the GCP Project.
In ssh_public_key, enter your SSH public key, similar to this:
ssh-rsa AAAAB*****************************************************************************zrysUvp0EkI5YWm+lmiQP4edfNKo0G3udxeAGdrD9dZSlzqmtdvo7CTW7Qhh3v2T3t3tvTEQnnNx8CkQOFDuU3Eje7NiN1XTp5C14dcGfaZeJnRnwaKhyD710ZHTeRyzjoXhNoAOuPT4qrT4MZ4jUUjr8Fx3ozByPlLco7qHsXurZHdTFWmdR52PlWRZA++9uyjz/sPYO+HcHxtIT5yS7DVfQz8zFQTyL0Rk82v6S0HNlG31mMlA2cPt0/r2vpY0U2zfijHdZEGxu+XeR/xRmVhPFImxN0rl
Optionally, set these variables, or use the default settings:
Variable | Description |
|---|---|
network_tags | (Optional) Add GCP network tags to the cluster |
labels | (Optional) Add GCP labels to the cluster |
ignore_nfs_permissions | If enabled, the VoC cluster will ignore file permissions and allow NFS and S3 clients to access data without checking permissions. Default: disabled. |
enable_similarity | Enable this setting to enable similarity-based data reduction on the cluster. Default: disabled. |
enable_callhome | Enable this setting to enable the sending of callhome logs on the cluster. Default: disabled. |
Configuring VIPs for the Cluster
Allocate VIPs for the cluster on GCP in the VAST Web UI, in the VIP Pools section of the Network Access page. The VIPs added here should be routed to GCP, and not be in any GCP subnets, or belong to any CIDR assigned to any of the GCP subnets.
Creating the VAST GCP Cluster using Terraform
Run the following command in the folder into which the zip file was extracted. This starts the Terraform deployment.
~/voc/gcp/gcp-new-deploy > terraform init
When complete, the following is shown:
Terraform has been successfully initialized!
Run the following command to deploy the Vast on Cloud cluster.
~/voc/gcp/gcp-new-deploy > terraform apply
When the Terraform action is complete, something similar to the following is shown:
Apply complete! Resources: 2 added, 0 changed, 9 destroyed. Outputs: availability_zone = "us-central1-a" cloud_logging = "https://console.cloud.google.com/logs/viewer?project=voc-test&advancedFilter=resource.type%3D%22gce_instance%22%0Alabels.cluster_id%3Ddc66387e-c8bb-5bd8-97db-469392f6bdba" cluster_mgmt = "https://10.120.9.243:443" instance_group_manager_id = "test-manager" instance_ids = tolist([ "1315258176142165158", "5926902481847174310", "4477224983631873190", ]) instance_type = "n2-highmem-48" private_ips = tolist([ "10.120.7.254", "10.120.8.0", "10.120.8.2", ]) protocol_vips = tolist([ "10.120.9.231", "10.120.9.232", "10.120.9.233", "10.120.9.234", "10.120.9.235", "10.120.9.236", ]) replication_vips = tolist([ "10.120.9.237", "10.120.9.238", "10.120.9.239", "10.120.9.240", "10.120.9.241", "10.120.9.242", ]) serial_consoles = [ "https://console.cloud.google.com/compute/instancesDetail/zones/us-central1-b/instances/test-instance-6873/console?port=1&project=voc-test", "https://console.cloud.google.com/compute/instancesDetail/zones/us-central1-b/instances/test-instance-955w/console?port=1&project=voc-test", "https://console.cloud.google.com/compute/instancesDetail/zones/us-central1-b/instances/test-instance-trl6/console?port=1&project=voc-test", ] vms_ip = "10.120.9.243" vms_monitor = "http://10.120.7.254:5551" voc_cluster_id = "dc66387e-c8bb-5bd8-97db-469392f6bdba" vpc_network = "voc-test"
At this point, the cluster installation starts on the resources created by Terraform in the GCP project.
Monitor progress of the installation at the vms_monitor URL The installation can take several minutes.
Accessing the Cluster in GCP
Access the Vast on Cloud VMS Cluster Web UI from a browser, using the cluster_mgmt URL (from the terraform apply step, above).
The cluster is built in private subnets, so you will need to access the VMS from within your own address space with a route to the GCP subnets.
Destroying or Changing the Cluster Configuration
To destroy the cluster, run this command:
terraform destroy
If you want to change the settings in the voc.auto.tfvars file, you must first destroy and then rebuild the cluster using Terraform. Do not run terraform apply after making changes to the file - this will corrupt the cluster.
Warning
Data in the cluster is not preserved when the cluster is destroyed using Terraform (including when destroying it in order rebuild it).
Run the following commands to rebuild the cluster after making changes to the file.
terraform destroy terraform apply
Best Practices for Terraform Files
The terraform files contained in the zip file contain important information used by Terraform to create your cluster. Take care that these files are not deleted or corrupted.
As well, it is recommended to back these files up.
It is also required that you use separate folders and files for each cluster you are provisioning on GCP using Terraform.