Multi-core LIBLINEAR

This extension is an OpenMP implementation to significantly reduce the training time in a shared-memory system. Technical details are in the following papers.

M.-C. Lee, W.-L. Chiang, and C.-J. Lin. Fast Matrix-vector Multiplications for Large-scale Logistic Regression on Shared-memory Systems, ICDM 2015 (Supplementary materials, code for paper's experiments).

W.-L. Chiang, M.-C. Lee, and C.-J. Lin. Parallel Dual Coordinate Descent Method for Large-scale Linear Classification in Multi-core Environments, ACM KDD 2016 (Supplementary materials, code for paper's experiments). Last updated: July 1, 2016

If you successfully used this tool for your applications, please let us know. We are interested in how it's being used.


How to Download and Run this LIBLINEAR Extensions

Please download the zip file. See README.multicore for details of running this extension.

The usage is the same as LIBLINEAR except a new option "-n." Specify "-n nr_thread" for training with nr_thread number of threads.
We now support

The dual solver is only recently released, we may have newer versions in the coming months.

Note: If you are compiling mex files with Visual Studio, you should modify make.m as follows.
mex CFLAGS="\$CFLAGS -std=c99" COMPFLAGS="/openmp $COMPFLAGS" -I.. -largeArrayDims train.c linear_model_matlab.c ../linear.cpp ../tron.cpp ../blas/daxpy.c ../blas/ddot.c ../blas/dnrm2.c ../blas/dscal.c
mex CFLAGS="\$CFLAGS -std=c99" COMPFLAGS="/openmp $COMPFLAGS" -I.. -largeArrayDims predict.c linear_model_matlab.c ../linear.cpp ../tron.cpp ../blas/daxpy.c ../blas/ddot.c ../blas/dnrm2.c ../blas/dscal.c


How to Run this LIBLINEAR Extension for Primal Solvers

For example:

> ./train -s 0 -n 8 rcv1_test.binary
will run L2-regularized logistic regression with 8 threads.


The above figure is the speedup of training rcv1_test.binary by using -s 0.

MATLAB/Octave/Python interfaces are now supported. Please check matlab/README.multicore.


How to Run this LIBLINEAR Extension for Dual Solvers

For example:

> ./train -s 3 -n 8 rcv1_test.binary
will run L2-regularized l1-loss SVM with 8 threads.

Training time on rcv1_test.binary is as follow:
Original LIBLINEAR-2.1: 1.935(sec)
This extension with 1 threads: 2.337(sec)
This extension with 2 threads: 1.566(sec)
This extension with 4 threads: 1.107(sec)
This extension with 8 threads: 0.946(sec)

Training time on epsilon_normalized is as follow:
Original LIBLINEAR-2.1: 12.759(sec)
This extension with 1 threads: 19.874(sec)
This extension with 2 threads: 12.241(sec)
This extension with 4 threads: 8.449(sec)
This extension with 8 threads: 7.493(sec)

MATLAB/Octave/Python interfaces will be available soon.


Binding Threads

You may turn on OMP_PROC_BIND, so OpenMP threads are not moved between CPUs. From our experience, the running time may be slightly shorter.
You can find other details in gnu official website. To use it, OpenMP version should be at least 3.1 (gcc 4.7.1 or later).

Here we assume Bash shell is used.

> export OMP_PROC_BIND=TRUE
> ./train -s 0 -n 8 epsilon_normalized
Training time: 56.76(sec)
Training time (without OMP_PROC_BIND): 62.27(sec)
Note: The effect depends on your architecture.


Parallelization on Cross-validation and Parameter Search (for ALL solvers)

In Makefile, add -DCV_OMP to CFLAGS,
CFLAGS += -DCV_OMP
The compiled code will parallelize both CV folds and the solver for each fold.
Here we take an example of using 5 threads on cross validation and 4 threads on the solver for each CV fold (i.e., 20 threads are used in total).
./train -s 0 -v 5 -n 4 rcv1_test.binary
cross-validation time (standard LIBLINEAR): 103.24(sec)
cross-validation time (5 CV threads, 4 threads on the solver for each CV fold): 13.15(sec)

Similarly, the parameter-search procedure can be parallelized.

./train -s 0 -n 4 -C rcv1_test.binary 
parameter search time (standard LIBLINEAR): 583.37(sec)
parameter search time (5 CV threads, 4 threads on the solver for each CV fold): 73.29(sec)

Note that regradless of whether the solver can be parallelized (i.e., -n option supported or not), you can always use parallel CV and parameter selection. In the following example, the dual-based solver itself is sequential, but the CV procedure is parallelized.

./train -s 1 -v 5 rcv1_test.binary
cross-validation time (standard LIBLINEAR): 14.57(sec)
cross-validation time (5 CV threads): 3.54(sec)

We do not allow users to assign the number of threads for CV threads. The performance may become worse for wrong settings on nested parallelism. For example, we run the data set covtype.binary and get the following results.

parameter search time (20 threads on the solver): 55.70(sec)
parameter search time (1 CV thread, 20 threads on the solver): 98.04(sec)

Note: We do NOT recommend using OMP_PROC_BIND for CV and parameter-search parallelization. Our experiments indicate that the performance becomes worse for some reasons of nested parallelism.

Some installation issues for interfaces or MS windows:


Please contact Chih-Jen Lin for any question.