Setting Docroots for Object Storage and HDFS
Docroot path syntax for object-storage based HST Servers is typically a protocol prefix that is followed by URL-encoded storage account access credentials and a path in that storage. Some storage configuration properties can also be set in the docroot or set in the protocol-specific Trapd .properties configuration file.
General Docroot Syntax
protocol://user:password@object_storage_URL/path/[?storage_configuration]
Docroot paths may be set to cloud or on-premises object storage in the HST Server GUI or by editing aspera.conf using asconfigurator.
To set the docroot for a user with asconfigurator, run the following command:
# asconfigurator -x "set_user_data;user_name,username;absolute,docroot"
The docroot can also be configured manually by adding the following text to /opt/aspera/etc/aspera.conf:
<user>
<name>username</name>
...
<file_system>
<access><paths><path>
<absolute>docroot</absolute>
</path></paths></access>
</file_system>
</user>
# /etc/init.d/asperanoded restart
Object Storage Docroot Formats
Docroot Formatting Requirements:
- The protocol prefixes for cloud-based docroot paths are case sensitive. For example, "s3://" is the correct prefix for S3 storage and "S3://" does not work.
- The variable components of URI docroots must be URL encoded, unless you are entering them in the HST Server GUI. For more information, see URL Encoding.
- Alibaba Cloud
-
oss://access_key:secret_key@endpoint/path
- Amazon S3
- Aspera recommends using IAM assumed roles, in which case the docroot has the
format:
s3://s3.amazonaws.com/my_bucket/
For more information on the IAM roles required for Aspera, see the following knowledge base article:
https://www.ibm.com/support/pages/iam-role-permissions-s3-buckets
Without IAM roles, you must specify your access_id and secret_key. You can find these values in the AWS Management Console by clicking your login name and selecting Security Credentials from the drop-down menu. The docroot includes this information with the following format:
s3://access_id:secret_key@s3.amazonaws.com/my_bucket
The docroot can also be used to set storage configuration properties including AWS storage class, infrequent access, server encryption, or AWS KMS encryption, by adding the appropriate option:
s3://s3.amazonaws.com/my_bucket/
?storage-class=REDUCED_REDUNDANCYs3://s3.amazonaws.com/my_bucket/
?storage-class=INFREQUENT_ACCESSs3://s3.amazonaws.com/my_bucket/
?server-side-encryption=AES256s3://s3.amazonaws.com/my_bucket/
?server-side-encryption=AWS_KMSThese options can be combined as in the following example, where the
&
that combines the queries must be URI encoded:s3://s3.amazonaws.com/my_bucket/
?storage-class=REDUCED_REDUNDANCY&server-side-encryption=AES256 - Azure blob
-
azu://storage_account:storage_access_key@blob.core.windows.net/path_to_blob
- Azure Files
-
azure-files://storage_account:storage_access_key@file.core.windows.net/share
- Azure Data Lake Storage
-
One one line:
adl://trap_stage.azuredatalakestore.net/folder/path
? dfs.adls.oauth2.access.token.provider.type=ClientCredential& dfs.adls.oauth2.client.id=client_application_id& dfs.adls.oauth2.refresh.url=https://login.windows.net/tenant_id/oauth2/token& dfs.adls.oatuh2.credential=client_application_keyWhere trap_stage is the name of the Data Lake Store. The token credentials can be specified in the configuration file (/opt/aspera/etc/trapd/adl.properties) instead of the docroot.
- Google Cloud Storage
- If the instance was set up with a Google service account, the docroot is set
as:
gs:///my_bucket/my_path
Without a Google service account, obtain the .p12 private key for your storage. For instructions on generating a private key, see the Google Cloud Platform documentation:
https://cloud.google.com/storage/docs/authentication#generating-a-private-key
Save the .p12 file in /opt/aspera/etc/trapd. You can specify the project ID and path to the private key either as part of the docroot URI, as in the following example:
gs://email_address@storage.googleapis.com/my_bucket/
?aspera.gssession.projectId=project_ID&aspera.gssession.pk12=path_to_private_key_pk12_file</absolute>Note: The email_address is the service account ID associated with the storage. You must URL encode the "@" when entering the email address in the docroot. For example, if the service account ID is test@developer.gserviceaccount.com, then it is entered in the docroot as:test%40developer.gserviceaccount.com
- Hadoop Distributed File System (HDFS)
-
hdfs://username@name_node_address:IPC_port/path_to_folder
Where username is that of an High-Speed Transfer Server transfer user. You can use any transfer user on the HST Server because the HDFS URI indicates which user is connecting to HDFS.
- IBM Cloud Object Storage (COS) - S3
-
s3://access_id:secret_key@accessor_endpoint/vault_name