Configuring Alluxio with VAST S3 and NFS

Prev Next

Intro

Alluxio ( https://www.alluxio.io/ ) is a data orchestration platform that fundamentally enables the separation of storage and compute. It brings speed and agility to big data and AI workloads, reduces costs by eliminating data duplication, and enables users to transition to newer storage solutions, such as object stores.

 

Note that you can use either VAST-S3 or VAST-NFS (with or without multipath) as a data source for Alluxio.  For now, we will focus on VAST-S3.

 

Setup VAST-S3 credentials

In order for Allexio (or any application) to be able to read and write data via VAST-S3 , you must first create a set of credentials to use.  For specific details on how to accomplish this, refer to “Managing S3 User Permissions.” 

Simply put, you will need to:

  1. Create a new user on VAST (alternatively, you can re-use an existing user, including one that exists within your AD/LDAP/NIS environment).

  1. (Optional, but recommended): Grant 'Allow Create Bucket' permissions to the user.

  1. Create an Access/Secret keypair for the user.

  

Create an S3 Bucket on VAST

Next, you will need to create a bucket for use with Alluxio.  There are multiple ways of accomplishing this. Please refer to "S3 Object Storage Protocol” for some of them. 

It is recommended to:

  1. Use the same access/secret key obtained from the previous steps.

  1. Choose a bucket name that does not collide with previously created bucket names, AND does not collide with any 'root level' directories on the VAST filesystem.

 

Here's an example of using s3cmd :

  1. Make an "s3cfg" file that contains the relevant parameters.  Note that in this example, we are disabling SSL/HTTPS for simplicity.  It is not required to disable this.

# cat allexio.s3cfg
[default]
access_key = V1EER4P8Q5CS5FM3Y28F
secret_key = zKs0I+b/Mbrt/dl1RYrVvJsoYngOB+xGMEEszD6s
host_base = main.selab-avnet203.sli.vastdata.com:80
host_bucket = main.selab-avnet203.sli.vastdata.com:80
use_https = False
server_side_encryption = False
signature_v2 = False
signurl_use_https = False
encrypt = False

 

  1. Make a bucket:

s3cmd -c alluxio.s3cfg mb s3://alluxiovasts3

 

Note: In more advanced scenarios, you may also want to adjust the S3 bucket ACL and Object ACLs to allow other users to access this data.  However, in this article, we will assume that only a single user/access key is used for all steps.

 

Now configure Alluxio 

VAST S3 as the root file system for Alluxio

#Download Alluxio - Please check https://www.alluxio.io/download/ for latest version
wget https://downloads.alluxio.io/downloads/files/2.6.2/alluxio-2.6.2-bin.tar.gz

#extract
tar -zxvf alluxio-2.6.2-bin.tar.gz

#configure alluxio-site.properties
cd alluxio-2.6.2/conf 
cp alluxio-site.properties.template alluxio-site.properties

#add the following to alluxio-site.properties
#please change the bucket, endpoint ,access and secret keys to yours. 
vi alluxio-site.properties
# Common properties
alluxio.master.hostname=localhost
# alluxio.master.mount.table.root.ufs=${alluxio.work.dir}/underFSStorage
alluxio.master.mount.table.root.ufs=s3://alluxiovasts3/
alluxio.underfs.s3.endpoint=http://main.selab-avnet203.sli.vastdata.com
alluxio.underfs.s3.disable.dns.buckets=true
alluxio.underfs.s3.inherit.acl=false
alluxio.underfs.s3.secure.http.enabled=false
alluxio.underfs.s3.list.objects.v1=true
aws.accessKeyId=V1EER4P8Q5CS5FM3Y28F
aws.secretKey=zKs0I+b/Mbrt/dl1RYrVvJsoYngOB+xGMEEszD6s

#Format cluster
cd /home/vastdata/alluxio-2.6.2/
./bin/alluxio format
#start cluster
./bin/alluxio-start.sh local
#run tests
./bin/alluxio runTests



#stop cluster if needed
./bin/alluxio-stop.sh local

http://<hostname>:19999/overview --- Alluxio web UI

VAST S3 added to an existing Alluxio cluster

#please change the bucket, endpoint ,access and secret keys to yours. 

cd /home/vastdata/alluxio-2.6.2/
bin/alluxio fs mount \
--option alluxio.underfs.s3.endpoint=http://main.selab-avnet203.sli.vastdata.com \
--option alluxio.underfs.s3.disable.dns.buckets=true \
--option alluxio.underfs.s3.inherit.acl=false \
--option alluxio.underfs.s3.secure.http.enabled=false \
--option alluxio.underfs.s3.list.objects.v1=true \
--option aws.accessKeyId=V1EER4P8Q5CS5FM3Y28F \
--option aws.secretKey=zKs0I+b/Mbrt/dl1RYrVvJsoYngOB+xGMEEszD6s \
/s3 s3://alluxiovasts3/

test using 
./bin/alluxio runTests -directory /s3

VAST NFS added to an existing Alluxio cluster

Please note that the NFS path must be identical on all Alluxio hosts.

Also note, do not install an Alluxio worker on the master node. Due to an issue with Alluxio, if you perform this action, all traffic will be routed exclusively through the master node. 

This uses a basic nfs mount. If you require more speed, enable nconnect, multipath, or RDMA to get line speed per client.

cd /home/vastdata/alluxio-2.6.2/
sudo mkdir /mnt/alluxionfs
sudo mount main.selab-avnet203.sli.vastdata.com:/alluxionfs /mnt/alluxionfs/
bin/alluxio fs mount /nfs /mnt/alluxionfs/
Mounted /mnt/alluxionfs at /nfs

test using 
./bin/alluxio runTests --directory /nfs

List of Configuration Properties - Alluxio v2.6.2 (stable)

NFS - Alluxio v2.6.2 (stable)