Event-driven processing with VAST, AWS SNS, and AWS Lambda

Prev Next

Complex workflows require orchestrating many different systems together. For instance, after processing some data, multiple downstream systems may be notified to begin their work. AWS Simple Notification Service (SNS) is often used for one-to-many notifications. By making use of an AWS Lambda function, we can extend that processing down to data stored on a VAST on Cloud system.

 

For this example, we'll use an AWS Lambda to run on SNS alerts for a process-complete notification for data that exists on the VAST cluster. A Lambda function will generate a sidecar file on the VAST system that contains additional metadata about the file, as well as a results message included in the SNS notification.

Configure VAST Cluster

On the VAST Cluster, we'll need to create an NFS view and enable a setting to allow NFS connections from ports above 1024.

View

Create a View that can be accessed over NFSv3. Log in to the VAST UI as a privileged user, such as an admin.

 

Under Element Store -> Views click Create View. Set:

  • Path to /processingresults

  • Protocols to NFSv3

  • Policy name to default

  • Select "Create Directory"

 

The image displays an "Add User" interface, where users can enter details such as name and UID to create a new user account. Additionally, it includes options to specify group affiliations and permissions like allowing bucket creation or deletion.

Note: The exact functionality might vary based on the specific software or system being used in this scenario.

vtool Setting

As a security feature, VAST disallows NFS mounts from unprivileged ports – that is, those above 1024 – by default. However, the AWS Lambda function will run as a regular user and therefore can only open ports above 1024. To allow this to work, we need to set a vtool setting.

 

ssh into one of the cnodes and run the following command:

vtool vsettings set NFS_ALLOW_INSECURE_PORTS=true

 

Configuring AWS Resources

AWS SNS

Start by creating an SNS topic for upstream processes to notify downstream consumers that processing is complete. We want to tie this into a Lambda function so make it a Standard type.

The image shows the creation of an Amazon SNS topic, where users can choose between FIFO (first-in, first-out) and standard messaging types based on their application's requirements. The topic name "Process-Complete-Notifications" is selected along with an optional display name "My Topic."

AWS Lambda

Create an AWS Lambda function that will process these events, connect to the VAST cluster, and create a sidecar file. Select a Python runtime version that matches the version you'll use to build the Lambda package locally.

The image displays the AWS Lambda interface for creating a new function, where users can choose to start from scratch with a simple "Hello World" example or use pre-defined blueprints or container images. Key options include selecting the runtime language (Python 3.9 is shown), architecture (x86_64 is selected), and permissions, which default to creating an execution role allowing access to CloudWatch Logs.

 Create a Python file called lambda_function.py with the following contents, updating the following global variables for your environment:

  • CLUSTER_IP - DNS name or IP address to your VAST on Cloud data vippool

  • NFS_EXPORT - NFS export name to mount on the VAST on Cloud

#!/usr/bin/env python3

"""
Process an SNS message for a file on the VAST and create
a sidecar file over NFS.
"""

from datetime import datetime
import json
import os
import libnfs


CLUSTER_IP = "vast.cluster.ip.address"
NFS_EXPORT = "processingresults"
def create_sidecar(filename, results):

    # Create NFS connection
    nfs = libnfs.NFS(f"nfs://{CLUSTER_IP}/{NFS_EXPORT}")

    # Get creation time and size
    file_stat = nfs.stat(filename)

    # Build our sidecar
    sidecar_contents = {
        "filename": filename,
        "size": file_stat["size"],
        "created": datetime.fromtimestamp(file_stat["ctime"]["sec"]).isoformat(),
        "results": results,
    }

 

    # Write out sidecar file
    (base, ext) = os.path.splitext(filename)
    sidecar_filename = f"{base}-sidecar.json"
    sidecar_fh = nfs.open(sidecar_filename, mode="wt")
    sidecar_fh.write(json.dumps(sidecar_contents, indent=2))
    sidecar_fh.close()

def process_message(message):
    """
    This function assumes that the content of the SNS message is a JSON object
    in the form:

    {
      "filename": "filename.ext",
      "results": "some string about processing results"
    }

    Where "filename" is an existing file path on the VAST cluster rooted
    at the export specified in NFS_EXPORT.
    """

    contents = json.loads(message)
    create_sidecar(contents["filename"], contents["results"])

def lambda_handler(event, context):

    if event:
        batch_item_failures = []
        sqs_batch_response = {}

        for record in event["Records"]:
            try:
                process_message(record["Sns"]["Message"])
            except Exception as e:
                batch_item_failures.append(
                    {"itemIdentifier": record["Sns"]["MessageId"]}
                )

        sqs_batch_response["batchItemFailures"] = batch_item_failures
        return sqs_batch_response

 Now we need to download the Python package dependencies and package them up with the code.

Note that to install libnfs you'll need the libnfs library installed on your system along with its header files. For Debian-based systems these are provided by libnfs and libfsn-dev. For RPM-based systems these are provided by libnfs and libnfs-devel.

 

$ mkdir ./packages
$ pip3 install --target ./packages libnfs
$ mkdir ./packages/lib
$ cp /usr/lib64/libnfs.* ./packages/lib
$ cd packages
$ zip -r ../lambda_function.zip .
$ cd ..
$ zip lambda_function.zip lambda_function.py

 

Next, we upload the bundle to the Lambda we created. The following assumes that you have the AWS CLI installed and configured:

$ aws lambda update-function-code --function-name Create-Sidecar --zip-file fileb://lambda_function.zip

 

Now to add the SNS trigger to the Lambda function.

The image shows the configuration screen for adding an AWS Lambda trigger, where users can subscribe to an SNS topic to invoke their Lambda function. The selected topic is "arn:aws:sns:us-east-1::Process-Complete-Notifications," and permissions will be automatically added by Lambda for invoking from this trigger.

Finally, update the Lambda config to:

  • Increase the timeout from 3 seconds to 30 seconds or more

  • Connect the Lambda to the VPC where the VAST on Cloud instance resides

The screenshot displays the AWS Lambda console, specifically under the Configuration tab, with various options such as General configuration, Triggers, Permissions, Destinations, Function URL, Environment variables, Tags, and VPC settings visible on the left sidebar. The current focus is on the "General configuration" section where details like Timeout (set to 0 min 30 sec), Memory allocation (128 MB), and SnapStart setting (None) can be seen in the center panel.

Publish a message

We can now send a message on the SNS topic with a message containing the filename and the processing results in a JSON structure. Ensure a file with the specified filename already exists on the cluster in the NFS export created above.

The image shows the interface for publishing messages to an Amazon SNS topic, allowing users to specify details such as message subject and Time-to-Live (TTL), along with selecting whether to use identical or custom payloads per delivery protocol in the message body section.

Conclusion

By using an AWS Lambda function and SNS, we can pull a VAST on Cloud system into a downstream automated processing pipeline alongside other consumers.