Manually Failing-Over to the Passive Node

To force a passive node to assume the active role, disable ACM on the active node and stop all Faspex services on that node.

  1. Determine the active node with the acmctl command.
    # /opt/aspera/acm/bin/acmctl -i
    Checking current ACM status...
    
    Aspera Cluster Manager status
    -----------------------------
    Local hostname:         faspex-ha2 
    Active node:            faspex-ha2  (me)
    Status of this node:    active
    ...
  2. Disable ACM locally.
    # /opt/aspera/acm/bin/acmctl -d
    ACM is disabled locally
  3. Check to confirm that no ACM instances are running.
    # ps aux | grep acm
    root 1248	0.0 	0.0 	103252	824	pts/0 	S+	17:18 0:00 grep acm
  4. Stop the Faspex services.
    # asctl all:stop
    Faspex Mongrels: Stop... done
    Faspex Background: Stop... done
    Faspex DS Background: Stop... done
    Faspex DB Background: Stop... done
    Faspex NP Background: Stop... done
    MySQL: Stop... done
    Apache: Stop... done
  5. Run the acmctl -i command to verify that all Faspex services have been stopped.
    # /opt/aspera/acm/bin/acmctl -i
    Checking current ACM status...
    
    ...
    
    Faspex active/active services status
    ------------------------------------
    Apache:               stopped
    Faspex Mongrels:      stopped
    s
    Faspex active/passive services status
    -------------------------------------
    MySQL:                stopped
    Faspex Background:    stopped
    Faspex NP Background: stopped
    Faspex DS Background: stopped
    Faspex DB Background: stopped
    
  6. Check the ACM logs to observe the other node taking over (this can take several minutes):
    2013-07-11 18:16:01 (-0700) acm faspex-ha2 (28736): ACM is disabled locally on this host: aborting
    2013-07-11 18:16:01 (-0700) acm faspex-ha1 (24404): ACM START (1.97)
    2013-07-11 18:16:06 (-0700) acm faspex-ha1 (24404): Lock acquired
    2013-07-11 18:16:06 (-0700) acm faspex-ha1 (24404): Checking if this node is active or passive
    2013-07-11 18:16:06 (-0700) acm faspex-ha1 (24404): Status file found
    2013-07-11 18:16:06 (-0700) acm faspex-ha1 (24404): From status file: active host is faspex-ha2
    2013-07-11 18:16:06 (-0700) acm faspex-ha1 (24404): This node is passive
    2013-07-11 18:16:06 (-0700) acm faspex-ha1 (24404): Checking if the status file is current
    2013-07-11 18:16:06 (-0700) acm faspex-ha1 (24404): Status file acm.status is too old (diff: 123)
    2013-07-11 18:16:07 (-0700) acm faspex-ha1 (24404): Status file acm.status is too old (diff: 124)
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): Status file acm.status is too old (diff: 125)
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): Failover scenario
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): Stopping MySQL on this node (if running)
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): Checking if MySQL is still active on the active node
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): Trying to establish a connection to MySQL on host 10.0.115.102
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): The connection to mysql failed, testing TCP port 4406 on host 10.0.115.102
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): Connection to 10.0.115.102 on port TCP/4406 failed, MySQL is likely to be down
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): Becoming the active node
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): ACM RESET
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): Deleting status file...
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): Status file deleted
    2013-07-11 18:16:08 (-0700) acm faspex-ha1 (24404): Stopping Faspex services
    2013-07-11 18:16:12 (-0700) acm faspex-ha1 (24404): Updating file /opt/aspera/acm/config/database.yml
    2013-07-11 18:16:12 (-0700) acm faspex-ha1 (24404): Active processing BEGIN
    2013-07-11 18:16:24 (-0700) acm faspex-ha1 (24404): Active processing END (12 seconds)
    2013-07-11 18:16:29 (-0700) acm faspex-ha1 (24404): Updating status files (hostname=faspex-ha1)
    2013-07-11 18:16:29 (-0700) acm faspex-ha1 (24404): ACM STOP
  7. Check that the active node is no longer active.
    # /opt/aspera/acm/bin/acmctl -i
    Checking current ACM status...
    
    
    Aspera Cluster Manager status
    -----------------------------
    Local hostname:         faspex-ha1
    Active node:            faspex-ha2
    Status of this node:    passive
    Status file:            current
    Disabled globally:      no
    Disabled on this node:  yes
    ...
    And check that the other node is now the active one:
    # /opt/aspera/acm/bin/acmctl -i
    Checking current ACM status...
    
    Aspera Cluster Manager status
    -----------------------------
    Local hostname:         faspex-ha1
    Active node:            faspex-ha1 (me)
    Status of this node:    active
    Status file:            current
    Disabled globally:      no
    Disabled on this node:  no
    Database host:          10.0.143.6
    
    IBM Aspera Faspex active/passive services status
    --------------------------------------
    Apache:	 running
    MySQL:	  running
    IBM Aspera Faspex:	running
    
    ...
    Note: If the node does not become active, copy the keystore.jks (/opt/aspera/faspex/lib/daemons/np/etc/keystore.jks) on one node to the other to make sure they are identical.
  8. Re-enable ACM on the node that recently became passive to let it start the active/active Faspex services.
    # /opt/aspera/acm/bin/acmctl -e
    ACM is enabled locally
  9. After a several minutes, you can verify that the active/active services have started on the passive node:
    # /opt/aspera/acm/bin/acmctl -i
    Checking current ACM status...
    
    Aspera Cluster Manager status
    -----------------------------
    Local hostname:         faspex-ha2 
    Active node:            faspex-ha1 
    Status of this node:    passive
    Status file:            current
    Disabled globally:      no
    Disabled on this node:  no
    
    Database configuration file
    ---------------------------
    Database host:        10.0.115.101
    
    Faspex active/active services status
    ------------------------------------
    Apache:               running
    Faspex Mongrels:      running
    
    
    ...