Sizing Guidelines and Suggestions

The sizing requirements differ for every high availability environment depending on the use case and security requirements of each environment. The guidelines in this article do not work for every set up, but are meant as a starting point for you to figure out the requirements for your particular set up.

There are two types of clients that can connect to a Shares HTTP endpoint: users and machines. Shares runs multiple Mongrel processes to handle requests, such as user interaction with the web application or a machine (such as IBM Aspera Cargo) calling the API to download the latest packages. To handle more requests, you need to configure Shares to run more Mongrel processes.

HTTP Connections from Users

Here are some sizing estimates to keep in mind:
  • 1 CPU core supports 3 mongrel processes
  • 1 mongrel process handles 3 concurrent, active users
  • 1 mongrel process typically requires 500 MBs of memory
    Tip: A mongrel process uses only 150-200MBs when first started, but usage goes up to the 500 MBs range as clients interact with the application.
In other words, the rule of thumb is:
1 CPU core ~ 3 mongrel processes * 3 concurrent active users ~ 10 concurrent, active users users
1 CPU core ~ 3 mongrel processes * 500 MBs of memory ~ 1.5 GBs of memory

Sizing Example

Your environment needs to be able to support your worst case scenario: peak hours when you have the highest number of concurrent, active users.

Note: The maximum number of concurrent, active users determines the number of cores and memory needed for your HA environment. This is a metric you must figure out yourself.
In our example, we have 1,000 clients in our user pool, but the maximum number of concurrent, active users is 200. Apply the rule of thumb to that number:
200 concurrent, active users / 10 users per core = 20 cores
200 concurrent, active users / 3 users per mongrel * 500 MBs ~ 34GBs of memory
Note: You should also allocate 4 GBs of memory for the OS and the MySQL database.

HTTP Connections from Machines

Tip: These guidelines use IBM Aspera Cargo as the client machine, since that is the primary use case.

The CPU core to client ratio is higher for Cargo clients since they connect more frequently (for example, every five minutes). Aspera recommends a conservative ration of one CPU core to 3 Cargo clients. Although Cargo clients take a up a more significant load, it's easier to determine and even control the peak number of connections if you control the machines running Cargo.

Make sure to spread out the schedules of Cargo machines so that they do not make calls to Shares all at the same time. Some situations can trigger a wave of connections that overloads the HA environment.

For example, a company performs a system-wide update of all Windows machines that requires a reboot. Each of those machines restarts at the same time, each Cargo application on those machines start at the same time, and the schedules of each Cargo application are now synchronized, triggering a massive spike in connections every 15 minutes.

You can ensure that Shares has enough mongrel processes to handle Cargo client connections by partitioning processes between Shares and Cargo. For more information, see Partitioning Mongrel Processes between Faspex and Cargo.

Network Storage

The network storage requirement applies mainly to the MySQL database, which needs to persist data. If using a standalone server, Aspera recommends using spinning disk drives with 10-15K RPMs or using SSD drives.

If using shared storage, use the same principles to keep latency low. In addition, the shared storage must be storage dedicated to Shares. At the least, you must make sure you have dedicated IO/sec for Shares. Sharing the storage with anything else will likely decrease performance.