This extension is an OpenMP implementation to significantly reduce the training time in a shared-memory system. Technical details are in the following papers.
M.-C. Lee, W.-L. Chiang, and C.-J. Lin. Fast Matrix-vector Multiplications for Large-scale Logistic Regression on Shared-memory Systems, ICDM 2015 (Supplementary materials, code for paper's experiments).
W.-L. Chiang, M.-C. Lee, and C.-J. Lin. Parallel Dual Coordinate Descent Method for Large-scale Linear Classification in Multi-core Environments, ACM KDD 2016 (Supplementary materials, code for paper's experiments).
Y. Zhuang, Y.-C. Juan, G.-X. Yuan, C.-J. Lin. Naive Parallelization of Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification, CIKM 2018 (Supplementary materials, code for paper's experiments).
If you successfully used this tool for your applications, please let us know. We are interested in how it's being used.
Please download the zip file. See README.multicore for details of running this extension.
The installation process and the usage are exactly the same as LIBLINEAR except a new option "-n." Specify "-n nr_thread" for training with nr_thread number of threads.
We now support
Due to multi-core operations, results under different number of threads may be slightly different. However, final objective values should be very similar.
MATLAB/Octave/Python interfaces are supported. For MATLAB/Octave, please check matlab/README.multicore.
mex CFLAGS="\$CFLAGS -std=c99" COMPFLAGS="/openmp $COMPFLAGS" -I.. -largeArrayDims train.c linear_model_matlab.c ../linear.cpp ../tron.cpp ../blas/daxpy.c ../blas/ddot.c ../blas/dnrm2.c ../blas/dscal.c mex CFLAGS="\$CFLAGS -std=c99" COMPFLAGS="/openmp $COMPFLAGS" -I.. -largeArrayDims predict.c linear_model_matlab.c ../linear.cpp ../tron.cpp ../blas/daxpy.c ../blas/ddot.c ../blas/dnrm2.c ../blas/dscal.c
For example:
> ./train -s 0 -n 8 rcv1_test.binarywill run L2-regularized logistic regression with 8 threads.
For example:
> ./train -s 3 -n 8 rcv1_test.binarywill run L2-regularized l1-loss SVM with 8 threads. Here we exclude the loading time
For example:
> ./train -s 5 -n 8 rcv1_test.binarywill run L1-regularized l2-loss SVM with 8 threads.
You may turn on OMP_PROC_BIND, so OpenMP threads are not moved between CPUs.
From our experience, the running time may be slightly shorter.
You can find other details in gnu official website.
To use it, OpenMP version should be at least 3.1 (gcc 4.7.1 or later).
Here we assume Bash shell is used.
> export OMP_PROC_BIND=TRUE > ./train -s 0 -n 8 epsilon_normalizedTraining time: 56.76(sec)
CFLAGS += -DCV_OMPThe compiled code will parallelize both CV folds and the solver for each fold.
./train -s 0 -v 5 -n 4 rcv1_test.binarycross-validation time (standard LIBLINEAR): 103.24(sec)
Similarly, the parameter-search procedure can be parallelized.
./train -s 0 -n 4 -C rcv1_test.binaryparameter search time (standard LIBLINEAR): 583.37(sec)
Note that regradless of whether the solver can be parallelized (i.e., -n option supported or not), you can always use parallel CV and parameter selection. In the following example, the dual-based solver itself is sequential, but the CV procedure is parallelized.
./train -s 1 -v 5 rcv1_test.binarycross-validation time (standard LIBLINEAR): 14.57(sec)
We do not allow users to assign the number of threads for CV threads. The performance may become worse for wrong settings on nested parallelism. For example, we run the data set covtype.binary and get the following results.
parameter search time (20 threads on the solver): 55.70(sec)
parameter search time (1 CV thread, 20 threads on the solver): 98.04(sec)
Note: We do NOT recommend using OMP_PROC_BIND for CV and parameter-search parallelization. Our experiments indicate that the performance becomes worse for some reasons of nested parallelism.
Some installation issues for interfaces or MS windows:
For MATLAB users, you should modify make.m like the following.
mex CFLAGS="\$CFLAGS -std=c99 -fopenmp -DCV_OMP" CXXFLAGS="\$CXXFLAGS -fopenmp -DCV_OMP" -I.. -largeArrayDims -lgomp train.c linear_model_matlab.c ../linear.cpp ../tron.cpp ../blas/daxpy.c ../blas/ddot.c ../blas/dnrm2.c ../blas/dscal.c mex CFLAGS="\$CFLAGS -std=c99" CXXFLAGS="\$CXXFLAGS -fopenmp -DCV_OMP" -I.. -largeArrayDims -lgomp predict.c linear_model_matlab.c ../linear.cpp ../tron.cpp ../blas/daxpy.c ../blas/ddot.c ../blas/dnrm2.c ../blas/dscal.c
setenv('CFLAGS', strcat(getenv('CFLAGS'), ' -fopenmp -DCV_OMP')) setenv('CXXFLAGS', strcat(getenv('CXXFLAGS'), ' -fopenmp -DCV_OMP'))
CFLAGS += -DCV_OMP ... $(CXX) -fopenmp -DCV_OMP $${SHARED_LIB_FLAG} linear.o tron.o blas/blas.a -o liblinear.so.$(SHVER)
For Visual Studio users, since VC does not support "dynamic" threadprivate, we do not support parallel CV on dual solvers now.
If you want to do parallel CV on primal solvers, you should remove the following line in linear.cpp
#pragma omp threadprivate(seed)and add /D CV_OMP and /openmp into CFLAGS in Makefile.win
mex CFLAGS="\$CFLAGS -std=c99" COMPFLAGS="/D CV_OMP /openmp $COMPFLAGS" -I.. -largeArrayDims train.c linear_model_matlab.c ../linear.cpp ../tron.cpp ../blas/daxpy.c ../blas/ddot.c ../blas/dnrm2.c ../blas/dscal.c mex CFLAGS="\$CFLAGS -std=c99" COMPFLAGS="/D CV_OMP /openmp $COMPFLAGS" -I.. -largeArrayDims predict.c linear_model_matlab.c ../linear.cpp ../tron.cpp ../blas/daxpy.c ../blas/ddot.c ../blas/dnrm2.c ../blas/dscal.c
For example,
>> [label, instance] = libsvmread('../heart_scale'); model = train(label, instance,'-c 1.0 -s 5 -n 1 -e 0.01');Total threads used: 1
>> [label, instance] = libsvmread('../heart_scale'); model = train(label, instance,'-c 1.0 -s 5 -n 1 -e 0.01');Total threads used: 1
However, if you type ``clear all'' to unload MEX functions, then you will have the same results
>> clear all; [label, instance] = libsvmread('../heart_scale'); model = train(label, instance,'-c 1.0 -s 5 -n 1 -e 0.01');Total threads used: 1