Distributed LIBLINEAR: A Practical Guide to run MPI LIBLINEAR Using VirtualBox

A virtual distributed environment is very useful for development. Indeed MPI LIBLINEAR is completely developed on such an environment. In this tutorial we help you to run MPI LIBLINEAR on Virtualbox.

To save your time from creating a brand new virtual machine, we provide an image with Ubuntu-13.10-server-i386 and other necessary packages installed. You can download the image from here. (MD5: 901003fd873f17532cae818618bc7902)

Now please verify the image.

    $ md5sum pineapple.ova
    901003fd873f17532cae818618bc7902  pineapple.ova
Please visit this tutorial to see how to establish a virtual environment. In this tutorial, we help you to build two nodes: pineapple0 and pineapple1.

If you have finished the procedure in the above tutorial, then you should be in pineapple0, and pineapple1 should be powered on.

Now let's start to do training using MPI LIBLINEAR.

    spongebob@pineapple0:~$ wget www.csie.ntu.edu.tw/~cjlin/libsvmtools/distributed-liblinear/mpi-liblinear-1.94.tar.gz
    spongebob@pineapple0:~$ tar -xzf mpi-liblinear-1.94.tar.gz
    spongebob@pineapple0:~$ cd mpi-liblinear-1.94
Create a machine file.
    spongebob@pineapple0:~$ echo "localhost" > machinefile
    spongebob@pineapple0:~$ echo "pineapple1" >> machinefile
Compile the source code,
    spongebob@pineapple0:~/mpi-liblinear-1.94$ make 
and copy the binaries to pineapple1.
    spongebob@pineapple0:~/mpi-liblinear-1.94$ ssh pineapple1 "mkdir mpi-liblinear-1.94"
    spongebob@pineapple0:~/mpi-liblinear-1.94$ scp train predict pineapple1:mpi-liblinear-1.94
Split heart_scale to two machines:
    spongebob@pineapple0:~/mpi-liblinear-1.94$ ./split.py machinefile heart_scale heart_scale.sub
Do distributed training:
    spongebob@pineapple0:~/mpi-liblinear-1.94$ mpirun -n 2 --machinefile machinefile --mca btl_tcp_if_include eth1 ./train heart_scale.sub
    #instance = 270, #feature = 13
    iter  1 act 7.935e+01 pre 7.092e+01 delta 1.483e+00 f 1.871e+02 |g| 1.263e+02 CG   3
    iter  2 act 8.579e+00 pre 7.397e+00 delta 1.483e+00 f 1.078e+02 |g| 2.673e+01 CG   4
    iter  3 act 9.693e-01 pre 9.058e-01 delta 1.483e+00 f 9.922e+01 |g| 6.363e+00 CG   5
    iter  4 act 2.016e-02 pre 1.996e-02 delta 1.483e+00 f 9.825e+01 |g| 7.840e-01 CG   5
Do distributed prediction:
    spongebob@pineapple0:~/mpi-liblinear-1.94$ mpirun -n 2 --machinefile machinefile --mca btl_tcp_if_include eth1 ./predict heart_scale.sub heart_scale.sub.model heart_scale.sub.out
    Accuracy = 83.7037% (226/270)