Synchronizing with AWS S3 Storage

Aspera Sync can be used to synchronize files when the source or destination is AWS S3 Cloud Object Storage. Each endpoint (HST Server) of the async session must be configured to support Aspera Sync and the async must include certain file system-related options.

Capabilities:

  • Non-continuous PUSH, PULL, and BIDI synchronization between a local disk and AWS S3, as well as between S3 buckets.
  • Continuous PUSH mode from local disk to S3 is fully supported.
  • Continuous PULL and BIDI when S3 is the content source; requires the --scan-interval option.

Requirements:

  • An IBM Aspera On Demand instance in AWS S3, or HST Server for Linux or Windows version 3.7.3 or later installed on a virtual machine instance in AWS with Trapd enabled. For instructions on setting up a HST Server in the cloud, see the High-Speed Transfer Server Admin Guide for Linux: Enabling AWS EC2/AWS S3 Using the Command Line.
  • The S3 instance must have an On Demand entitlement and a Aspera Sync-enabled license.
  • The async binary must be installed on both the source and destination server.
  • Configure the S3 instance, or both S3 endpoints if you are running an S3-to-S3 synchronization, as described in the following steps.
  1. SSH into your instance as root by running the following command.
    The command is for Linux but also works for Mac. Windows users must use an SSH tool, such as PuTTy.
    # ssh -i identity_file -p 33001 ec2-user@ec2_host_ip
  2. Elevate to root privileges by running the following command:
    #  su -
  3. Set an S3 docroot for the system account user that will be used to run async.
    # asconfigurator -x "set_user_data;user_name,username;absolute,s3://s3.amazonaws.com/bucketname"

    If you are not using IAM roles, then you must also specify the S3 credentials in your docroot:

    s3://access_id:secret_key@s3.amazonaws.com/my_bucket

    By setting the docroot for the system user, the account becomes an Aspera transfer user.

  4. Set database and log directories for async.
    These directories must be located in /mnt/ephemeral/data. The /mnt/ephemeral/ directory is no-cost ephemeral storage that is associated with your instance. Aspera recommends creating a directory to use that is named for the transfer user, and giving the transfer user write access. For example, if the transfer user is ec2_user, run the following commands to create the directory /mnt/ephemeral/data/ec2_user, create the database and log subdirectories, give ec2_user write access, and set the directories as the location for the database and logs:
    # mkdir /mnt/ephemeral/data/ec2_user
    # mkdir /mnt/ephemeral/data/ec2_user/db
    # mkdir /mnt/ephemeral/data/ec2_user/log
    # chown -R ec2_user /mnt/ephemeral/data/ec2_user
    # asconfigurator -x "set_node_data;async_db_dir,/mnt/ephemeral/data/ec2_user/db"
    # asconfigurator -x "set_node_data;async_log_dir,/mnt/ephemeral/data/ec2_user/log"

Examples of Sync to or from S3

Note: If the client is on the cloud storage host, the following options are required:
  • The log directory and local database directory must be specified by using the -L and -b options.
  • The --apply-local-docroot option must be used in order to transfer content into the object storage, rather than the local disk.

The following examples include the optional arguments --transfer-threads, --local-fs-threads, and --remote-fs-threads, which improve performance when one or both endpoints are in cloud storage.

One-time push from local disk to S3:

A one-time (non-continuous) push that is run from a local disk to an S3 bucket using SSH keys (for more information on using SSH keys, see Creating SSH Keys), where ec2_user is the transfer user:

# async -N sync-to-s3 -d /data/data-2017-01 -r ec2_user@192.0.4.24:/data -i /bobcat/.ssh/private_key -K push -B /mnt/ephemeral/data/db --transfer-threads=8 --remote-fs-threads=16

One-time bidi from S3 to local disk:

A one-time bidirectional sync that is run from the S3 client to a local disk:

# async -L /mnt/ephemeral/data/log --apply-local-docroot -N bidi_london -d /data -r bear@192.0.12.442:/data -K bidi -b /mnt/ephemeral/data/db -B /async/log --transfer-threads=8 --local-fs-threads=16

One-time pull from S3 to S3:

A one-time pull by ec2_user from s3host to /data/2017 in the client S3 storage:

# async -L /mnt/ephemeral/data/log --apply-local-docroot -N s3sync -d /data/2017 -r ec2_user@s3host:/data/2017-01 -K pull -b /tmp --transfer-threads=8 --local-fs-threads=16 --remote-fs-threads=16