Server Set up in Hadoop Distributed File System (HDFS) Storage

An Aspera server can be installed on an instance in HDFS and run as a self-managed server that enables high-speed transfers with your HDFS. Settings must be changed in both the HDFS and High-Speed Transfer Server configuration files.

  1. Login to any HDFS node.
  2. Record the NameNode address and IPC port to use in configuring HST Server.
    Open /etc/hadoop/conf/core-site.xml and look for the line <name>fs.defaultFS</name>. The <value> setting below it specifies the NameNode address and IPC port. For example, if the NameNode address is hadoop-node.aspera.us and the IPC port is 8020, then the entry looks like the following:
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop-node.aspera.us:8020</value>
    </property>

    Keep the file open for the next step.

  3. Configure HDFS to allow HST Server to connect to it.
    Open /etc/hadoop/conf/core-site.xml if it is not already open. Enable the hdfs superuser to impersonate users and add the HST Server host, or hosts in the case of clusters, to the list of hosts allowed to connect to the NameNode. The host value can be a comma-delimited list of IP addresses, IP address ranges in CIDR format, or host names:
    <property>
        <name>hadoop.proxyuser.hdfs.groups</name>
        <value>*</value>
    </property>
       
    <property>
        <name>hadoop.proxyuser.hdfs.hosts</name>
        <value>host</value>
    </property>
    Note: If you are using Kerberos, Aspera recommends specifying a user other than the superuser as the user allowed to impersonate other users. For more information, see Configuring Kerberos for Hadoop Distributed File System (HDFS) Transfers.

    Save and close the file.

  4. Restart HDFS to activate your changes.
    If you are using an Amazon EMR cluster, run the following commands:
    # stop hadoop-hdfs-namenode
    # start hadoop-hdfs-namenode
  5. Configure the docroot in HST Server's aspera.conf.
    Run the following asconfigurator command to set the HDFS docroot for the HDFS user:
    # asconfigurator -x "set_user_data;user_name,username;absolute,hdfs://username@name_node_address:IPC_port/path_to_folder"

    Where username is that of an High-Speed Transfer Server transfer user. You can use any transfer user on the HST Server because the HDFS URI indicates which user is connecting to HDFS.

    For example, if the HDFS user is xfer, the NameNode address is hadoop-node.aspera.us, and the IPC port is 8020, then the command is the following:

    # asconfigurator -x "set_user_data;user_name,xfer;absolute,hdfs://xfer@hadoop-node.aspera.us:8020/user/xfer"
  6. Ensure that the HDFS super user matches the name of the user running the NameNode service.
    The HDFS super user is specified in /opt/aspera/etc/trapd/hdfs.properties in the following line, in which hdfs is the default:
    #aspera.hdfs.superuser.name = hdfs

    The HDFS super user specified in hdfs.properties must match the name of the actual user who runs the NameNode service on the NameNode node. If the user running the NameNode service is not hdfs, uncomment the line and enter the correct super username. Save and close the file.

  7. Restart Trapd to activate your changes.
    # systemctl asperatrapd restart
HST Server can be configured to use Kerberos for HDFS transfers. For more instructions, see Configuring Kerberos for Hadoop Distributed File System (HDFS) Transfers.