Distributed LIBLINEAR: A Practical Guide on Amazon Elastic Compute Cloud (EC2)

1. Introduction

Extended from LIBLINEAR, Distributed LIBLINEAR is an open source library for large-scale linear classification in distributed environments. Amazon Elastic Compute Cloud (EC2) is a popular distributed environment. We propose a simple procedure of Distributed-LIBLINEAR for the beginners in Amazon EC2.

1.1 Usage Procedure

We assume that you have owned an Amazone EC2 account. You can use the following procedure to manage Distributed LIBLINEAR on Amazon EC2.


2. Environment Preparation

We use Boto, a Python interface offered by Amazon Web Services, to manage all the settings.
For detailed information about the installation and configuration of Boto, there is an official guide.
If you have prepared an Amazon Machine Images (AMI) with proper settings, please directly go to Section 3.

2.1 Making Connections

>>> import boto.ec2
                        
If you have filled your AWS_ACESS KEY_ID and AWS_SECRETE_ACCESS_KEY in "~/.boto" on your home directory like:

[Credentials]
aws_access_key_id = AWS_ACCESS KEY_ID
aws_secret_access_key = AWS_SECRET_ACCESS_KEY

please type:
>>> conn = boto.ec2.connect_to_region("us-east-1")
                        
If not, you need to type:
>>> conn = boto.ec2.connect_to_region("us-east-1", aws_access_key_id="AWS_ACCESS_KEY_ID"
... aws_secret_access_key="AWS_SECRETE_ACCESS_KEY")
NOTICE: AWS_ACCESS KEY_ID and AWS_SECRETE_ACCESS_KEY can be obtained from the AWS homepage by clicking Account > Security Credentials > Access Credentials.

2.2 Defining Security Groups

>>> sg = conn.create_security_group("distributed_tron", "The description")
>>> sg.authorize("tcp",0 , 65535, "0.0.0.0/0")
True

2.3 Importing Key Pairs

>>> conn.import_key_pair("dtron_key_pair", "Please fill your public key as specified in RFC4716 format")
KeyPair:dtron_key_pair
NOTICE: You can put the first line of your "~/.ssh/id_rsa.pub" as the 2nd parameter.
If you do not have this file, please type "ssh-keygen" to generate a public key.

    2.4 Opening Instances and get the Public IP of the Instance

    Open an instance,
    >>> res = conn.run_instances("ami-35dbde5c", key_name="dtron_key_pair", instance_type="m1.small", security_groups=["distributed_tron"])
    wait for it ready,
    >>> print [ins.state for ins in conn.get_only_instances() if ins.id in set ([i.id for i in res.instances])]
    ['running']
    and then get the public IP.
    >>> instances_list = [ins for ins in conn.get_only_instances() if ins.id in set([i.id for i in res.instances])]
    >>> print [instance.ip_address for instance in instances_list]
    ['54.234.133.223']
    NOTICE: "ami-35dbde5c" is the image id of Ubuntu Server 13.10.
    You can find a detailed list of image ids from AWS Marketplace or free tier eligible image ids from the AWS Console by clicking Services > EC2 > Launch Instance.

    2.5 Login to the Instance and Install the Dependence

    $ ssh ubuntu@54.234.133.223
                                
    $ sudo apt-get update
    $ sudo apt-get install build-essential
    $ sudo apt-get install openmpi-bin libopenmpi-dev

    2.6 Generate SSH Keys

    $ ssh-keygen
    $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

    2.7 Saving Instances to An AMI

    Get the instance id which you will save to an AMI,
    >>> print [ins.id for ins in instances_list]
    ['i-32ddf063']
    save to an AMI,
    >>> image_id = conn.create_image("i-32ddf063", "ami_distributed_liblinear")
    wait for creating the AMI successfully,
    >>> print conn.get_image(image_id).state
    available
    and then get the image id that you just created.
    >>> print image_id
    ami-ad5c4ec4
    Then you can terminate the instance if you do not need it any more.
    >>> for instance in instances_list: instance.terminate()


    3. Management

    Assume some configuration parameters of our setting are:
    • image_id = “"ami-ad5c4ec4"
    • key_name = "dtron_key_pair"
    • security_group_ids = ["distributed_tron"]
    We use 4 instances as an example.

    3.1 Opening Instances and Get the IPs

    Amazon EC2 offers two purchasing options, On-Demand Instances and Spot Instances.
    If you choose On-Demand Instances, you can refer to Section 3.1.1, while Section 3.1.2 for Spot Instances.

    3.1.1 Using On-Demand Instances

    Open 4 On-Demand Instances,
    >>> import boto.ec2
    >>> conn = boto.ec2.connect_to_region("us-east-1")
    >>> reservations = conn.run_instances("ami-ad5c4ec4", min_count = 4, max_count = 4, instance_type = "m1.small",
    ... key_name = "dtron_key_pair",security_groups = ["distributed_tron"])
    wait for all 4 instance ready,
    >>> instances_id_set = set([instance.id for instance in reservations.instances])
    >>> print [instance.state for instance in conn.get_only_instances() if instance.id in instances_id_set]
    ['running', 'running', 'running', 'running']
    get these 4 instances,
    >>> instances_list = [instance for instance in conn.get_only_instances() if instance.id in instances_id_set]
    then get the public IPs,
    >>> print [instance.ip_address for instance in instances_list]
    ['54.198.68.79', '107.21.167.54', '50.19.175.101', '54.82.82.143']
    and the private IPs.
    >>> print [instance.private_ip_address for instance in instances_list]
    ['10.64.9.158', '10.208.50.228', '10.110.179.40', '10.209.134.15']

    3.1.2 Using Spot Instances

    Open 4 Spot Instances,
    >>> import boto.ec2
    >>> conn = boto.ec2.connect_to_region("us-east-1")
    >>> spot_requests_list = conn.request_spot_instances("0.02", "ami-e5aeb48c", count = 4, instance_type = "m1.small", key_name = "dtron_key_pair",
    ... security_groups = ["distributed_tron"])
    wait for all 4 instance ready,
    >>> request_id_set = set([request.id for request in spot_requests_list])
    >>> print [instance.state for instance in conn.get_only_instances() if instance.spot_instance_request_id in request_id_set]
    ['running', 'running', 'running', 'running']
    get these 4 instances,
    >>> instances_list = [instance for instance in conn.get_only_instances() if instance.spot_instance_request_id in request_id_set]
    then get the public IPs,
    >>> print [instance.ip_address for instance in instances_list]
    ['54.198.68.79', '107.21.167.54', '50.19.175.101', '54.82.82.143']
    and the private IPs.
    >>> print [instance.private_ip_address for instance in instances_list]
    ['10.28.23.186', '10.29.192.218', '10.28.84.41', '10.28.23.160']

    3.2 Distributed Training and Prediction

    After we have obtained the following IP addresses from the previous step.

    NOTICE: The IP addresses are dynamic, so yours may be different from the IPs in the following table.

    Index Public IPs Private IPs
    1. 54.198.68.79 10.28.23.186
    2. 107.21.167.54 10.29.192.218
    3. 50.19.175.101 10.28.84.41
    4. 54.82.82.143 10.28.23.160

    3.2.1 Choosing the Commander

    Take "54.198.68.79" as the commanding machine and login to it
    $ ssh ubuntu@54.198.68.79

    3.2.2 Compiling the Code

    Download the code to the commander and build distributed liblinear.
    $ cd disitributed_liblinear
    $ make

    3.2.3 Configuring the 'machine_file'

    Create a file 'machine_file' and give private IPs there.

    NOTICE: the first line should be the private IP of your commander.

    10.28.23.186
    10.29.192.218
    10.28.84.41
    10.28.23.160

    3.2.4 Put the executable file ('train' and 'predict') to each machine with the same path

    Assume that you put all executable file to the home directory, that is, "~/train" and "~/predict" on each machine.

    3.2.5 Distributing the data to each machine with the same path

    $ ./split.py ./machine_file ./heart_scale ~/heart_scale.sub

    3.2.6 Training

    $ mpirun -n 4 --machinefile ./machine_file ~/train -s 0 -c 1 ~/heart_scale.sub ~/heart_scale.sub.model

    3.2.7 Prediction

    $ mpirun -n 4 --machinefile ./machine_file ~/predict ~/heart_scale.sub ~/heart_scale.sub.model ~/prediction
    Accuracy = 83.7037% (226/270)

    3.3 Terminating the Instances

    If you have finished the training and the prediction, you can terminate your instances.
    >>> for instance in instances_list: instance.terminate()