LIBLINEAR FAQ

Some questions are listed in LIBSVM FAQ.

Introduction, Installation, and Documents
Data
Training and Prediction
Python Interface
Windows Binary Files
L1-regularized Classification
L2-regularized Support Vector Regression

Introduction, Installation, and Documents

Q: When to use LIBLINEAR but not LIBSVM?

Please check our explanation on the LIBLINEAR webpage. Also see appendix C of our SVM guide.

Q: Where can I find documents of LIBLINEAR?

Please see the descriptions at LIBLINEAR page.

Q: I would like to cite LIBLINEAR. Which paper should I cite?

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A Library for Large Linear Classification, Journal of Machine Learning Research 9(2008), 1871-1874. Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear

The bibtex format is

@Article{REF08a,
author = 	 {Rong-En Fan and Kai-Wei Chang and Cho-Jui Hsieh and Xiang-Rui Wang and Chih-Jen Lin},
title = 	 {{LIBLINEAR}: A Library for Large Linear Classification},
journal = 	 {Journal of Machine Learning Research},
year = 	 {2008},
volume =	 {9},
pages =	 {1871--1874}
}


Q: Where are change log and earlier versions?

See the change log and directory for earlier/current versions.

Q: How do I choose the solver? Should I use logistic regression or linear SVM? How about L1/L2 regularization?

Generally we recommend linear SVM as its training is faster and the accuracy is competitive. However, if you would like to have probability outputs, you may consider logistic regression.

Moreover, try L2 regularization first unless you need a sparse model. For most cases, L1 regularization does not give higher accuracy but may be slightly slower in training.

Among L2-regularized SVM solvers, try the default one (L2-loss SVC dual) first. If it is too slow, use the option -s 2 to solve the primal problem.

Data

Q: Is it important to normalize each instance?

For document classification, our experience indicates that if you normalize each document to unit length, then not only the training time is shorter, but also the performance is better.

Q: How could I use MATLAB/OCTAVE interface for fast dataload?

If you need to read the same data set several times, saving data in MATLAB/OCTAVE binary formats can significantly reduce the loading time. The following MATLAB code generates a binary file rcv1_test.mat:

[rcv1_test_labels,rcv1_test_inst] = libsvmread('../rcv1_test.binary');
save rcv1_test.mat rcv1_test_labels rcv1_test_inst;

For OCTAVE user, use
save -mat7-binary rcv1_test.mat rcv1_test_labels rcv1_test_inst;

to save rcv1_test.mat in MATLAB 7 binary format. (Or you can use -binary to save in OCTAVE binary format) Then, type
load rcv1_test.mat

to read data. A simple experiment shows that read_sparse takes 88 seconds to read a data set rcv1 with half million instances, but it costs only 7 seconds to load the MATLAB binary file. Please type
help save

in MATLAB/OCTAVE for further information.

Training and Prediction

Q: LIBLINEAR is slow for my data (reaching the maximal number of iterations)?

Very likely you use a large C or don't scale data. If your number of features is small, you may use the option

-s 2
by solving the primal problem. More examples are in the appendix C of our
SVM guide.
Q: Is LIBLINEAR gives the same result as LIBSVM with linear kernel?

They should be very similar. However, sometimes the difference may not be small. Note that LIBLINEAR does not use the bias term b by default. If you observe very different results, try to set -B 1 for LIBLINEAR. This will add the bias term to the loss function as well as the regularization term (w^Tw + b^2). Then, results should be closer.

To make results exactly the same as LIBSVM, you can

• modify the primal-based solver for L2-loss SVC; see the FAQ below
• modify LIBSVM to solve L2-loss SVC; see LIBSVM FAQ: "I would like to solve L2-loss SVM (i.e., error term is quadratic). How should I modify the code ?"

For some multi-class data, the difference between LIBSVM and LIBLINEAR may be significant. The reason is that LIBSVM uses the 1-vs-1 strategy, while LIBLINEAR uses 1-vs-the rest.

Q: To have a bias term b in the decision function, LIBLINEAR embeds it into the weight vector and adds a constant feature to each instance; see the option -B. This setting causes that a b^2/2 term is added to the objective function. How do I solve the optimization problem without the b^2/2 term?

Take L2-regularized L2-loss SVC as an example. If -B 1 is specified, LIBLINEAR solves

min_{w,b} w^Tw/2 + b^2/2 + C \sum max(0, 1- (y_i w^Tx_i+b))^2.

Now we would like to solve

min_{w,b} w^Tw/2 + C \sum max(0, 1- (y_i w^Tx_i+b))^2.

It's difficult to modify dual-based solvers for the above problem. However, primal-based solvers can be easily changed by modifying function evaluation, gradient evaluation, and Hessian-vector products. First, in l2r_l2_svc_fun::fun for function evaluation, modify

	for(i=0;i<w_size;i++)
f += w[i]*w[i];

to
	for(i=0;i<w_size-1;i++)
f += w[i]*w[i];

	for(i=0;i<w_size;i++)
g[i] = w[i] + g[i];

	g[w_size-1] -= w[w_size-1];

Third, in l2r_l2_svc_fun::Hv for computing the Hessian-vector product, after
	for(i=0;i<w_size;i++)
Hs[i] = s[i] + 2*Hs[i];

	Hs[w_size-1] -= s[w_size-1];

Note that you need to run with the "-B 1" option.

For L2-regularized logistic regression, the modification is exactly the same.

For L2-regularized L2-loss SVR, the modification for function and gradient evaluation is the same. However, its Hessian-vector product is by the code of SVC through inheritance. Therefore, you need to modify l2r_l2_svc_fun::Hv.

This FAQ is prepared by Pin-Yen Lin.

Q: How to select the regularization parameter C?

After version 2.0, an option -C is provided to find C. For example, you can run

> train -C data_file
to find the C value with the best CV rate.

The -C option is available for classification only at this moment. For regression, you can use gridregression.py from libsvm tools. Several options must be specified.

1. -svmtrain train': use the command train' of LIBLINEAR
2. -log2g null': do not grid with g'
For example, you can run
> python gridregression.py -log2c -3,0,1 -log2g null -log2p -1,0,1 -svmtrain ./train -s 11 heart_scale

to check RSE values at C=2^-3, 2^-2, 2^-1, and 2^0, and p=2^-1 and 2^0 .
Q: Why in some situations the software seems to be slower than that used in the JMLR paper (logistic regression)?

We guess that you are comparing

> time ./train -s 0 -v 5 -e 0.001 data

with the environment used in our paper, and find that LIBLINEAR is slower. Two reasons may cause the diffierence.
1. The above timeing of LIBLINEAR includes time for reading data, but in the paper we exclude that part.
2. In the paper, to conduct 5-fold (or 2-fold) CV we group folds used for training as a separate matrix, but LIBLINEAR simply uses pointers of the corresponding instances. Therefore, in doing matrix-vector multiplications, the former sequentially uses rows in a continuous segment of the memory, but the latter does not. Thus, LIBLINEAR may be slower but it saves the memory.

Q: Why in linear.cpp you don't call log1p for log(1+...)? Also gradient/Hessian calculation may involve catastrophic cancellations?

We carefully studied such issues, and decided to use the current setting. For data classification, one doesn't need very accurate solution, so numerical issues are less important. Moreover, log1p is not available on all platforms. Please let us know if you observe any numerical problems.

Q: Can you explain more about the model file?

Assume k is the total number of classes and n is the number of features. In the model file, after the parameters, there is an n*k matrix W, whose columns are obtained from solving two-class problems: 1 vs rest, 2 vs rest, 3 vs rest, ...., k vs rest. For example, if there are 4 classes, the file looks like:

+-------+-------+-------+-------+
| w_1vR | w_2vR | w_3vR | w_4vR |
+-------+-------+-------+-------+


Q: Why the sign of predicted labels and decision values are sometimes reversed?

To correctly obtain decision values, you need to check the array

label
in the model.
Q: Why you support probability outputs for logistic regression only?

LIBSVM uses more advanced techniques for SVM probability outputs. The code is a bit complicated so we haven't decided if including it is suitable or not.

If you really would like to have probability outputs for SVM in LIBLINEAR, you can consider using the simple probability model of logistic regression. Simply modify the following subrutine in linear.cpp.

int check_probability_model(const struct model *model_)
{
return (model_->param.solver_type==L2R_LR ||

to
int check_probability_model(const struct model *model_)
{
return 1;


Q: How could I know which training instances are support vectors?

Some LIBLINEAR solvers consider the primal problem, so support vectors are not obtained during the training procedure. For dual solvers, we output only the primal weight vector w, so support vectors are not stored in the model. This is different from LIBSVM.

To know support vectors, you can modify the following loop in solve_l2r_l1l2_svc() of linear.cpp to print out indices:

	for(i=0; i<l; i++)
{
v += alpha[i]*(alpha[i]*diag[GETI(i)] - 2);
if(alpha[i] > 0)
++nSV;
}

Note that we group data in the same class together before calling this subroutine. Thus the order of your training instances has been changed. You can sort your data (e.g., positive instances before negative ones) before using liblinear. Then indices will be the same.
Q: How to speedup LIBLINEAR using OpenMP for primal solvers?

Please see multi-core LIBLINEAR page for details. This extension can dramatically reduce the running time on a shared-memory system.

This FAQ is for solvers. For multiclass classification, please check How to speedup multiclass classification using OpenMP instead.

Q: How to speedup multiclass classification using OpenMP?

Please take the following steps. Note that it works only for -s 0, 1, 2, 3, 5, 6, 7.

In Makefile, add -fopenmp to CFLAGS.

In linear.cpp, replace the following segment of code

				model_->w=Malloc(double, w_size*nr_class);
double *w=Malloc(double, w_size);
for(i=0;i<nr_class;i++)
{
int si = start[i];
int ei = si+count[i];

k=0;
for(; k<si; k++)
sub_prob.y[k] = -1;
for(; k<ei; k++)
sub_prob.y[k] = +1;
for(; k<sub_prob.l; k++)
sub_prob.y[k] = -1;

if(param->init_sol != NULL)
for(j=0;j<w_size;j++)
w[j] = param->init_sol[j*nr_class+i];
else
for(j=0;j<w_size;j++)
w[j] = 0;

train_one(&sub_prob, param, w, weighted_C[i], param->C);

for(j=0;j<w_size;j++)
model_->w[j*nr_class+i] = w[j];
}
free(w);

with
				model_->w=Malloc(double, w_size*nr_class);
#pragma omp parallel for private(i, j, k)
for(i=0;i<nr_class;i++)
{
problem sub_prob_omp;
sub_prob_omp.l = l;
sub_prob_omp.n = n;
sub_prob_omp.x = x;
sub_prob_omp.y = Malloc(double,l);

int si = start[i];
int ei = si+count[i];

double *w=Malloc(double, w_size);

k=0;
for(; k<si; k++)
sub_prob_omp.y[k] = -1;
for(; k<ei; k++)
sub_prob_omp.y[k] = +1;
for(; k<sub_prob_omp.l; k++)
sub_prob_omp.y[k] = -1;

if(param->init_sol != NULL)
for(j=0;j<w_size;j++)
w[j] = param->init_sol[j*nr_class+i];
else
for(j=0;j<w_size;j++)
w[j] = 0;

train_one(&sub_prob_omp, param, w, weighted_C[i], param->C);

for(j=0;j<w_size;j++)
model_->w[j*nr_class+i] = w[j];
free(sub_prob_omp.y);
free(w);
}

Using 8 cores on the set rcv1_test.multiclass.bz2.
%export OMP_NUM_THREADS=8
%time ./train -s 2 rcv1_test.multiclass
2m4.019s
%time ./train -s 1 rcv1_test.multiclass
0m45.349s

Using standard LIBLINEAR
%time ./train -s 2 rcv1_test.multiclass
6m52.237s
%time ./train -s 1 rcv1_test.multiclass
1m51.739s


Q: How to speedup cross-validation (-v option) and parameter search (-C option) using OpenMP?

If you use solvers -s 0, -s 2, or -s 11, please directly use multi-core LIBLINEAR.

For parameter search (i.e, -C option), which is available for -s 0, 2, 11 only, please also check multi-core LIBLINEAR for details.

For cross validation using other solvers, please modify LIBLINEAR by the following steps.

In Makefile, add -fopenmp to CFLAGS.

In linear.cpp, add a line of code in the cross_validation function:

#pragma omp parallel for private(i) schedule(dynamic)
for(i=0;i<nr_fold;i++)
{
int begin = fold_start[i];
int end = fold_start[i+1];


We take an example of using 5 threads on the data set rcv1_test.binary. Here we assume Bash is used.

> export OMP_NUM_THREADS=5
> ./train -s 0 -v 5 rcv1_test.binary


cross-validation time (standard LIBLINEAR): 103.24(sec)

Note: It will be useless to assign the number of threads more than the number of CV folds.

Python Interface

Q: While using the Python interface, I have memory efficiency issue on storing instances in the required data structure. What should I do?

Windows Binary Files

Q: When using the default solver on large data, why the number of iterations on windows is much more that that on linux?

In linear.cpp, for the implementation of coordinate descent methods we use rand() to permute data instances. Unfortunately on MS windows, rand() returns a value in [0, 32767]. This is too small to ensure the randomness of the data permutation, so the convergence becomes slow. In contrast, on linux rand() returns in a value in a much larger range, so this problem does not occur.

A quick solution is to replace

    rand()

with
    (rand()*32768+rand())

and rebuild the code.

L1-regularized Classification

Q: When should I use L1-regularized classifiers?

If you would like to identify important features. For most cases, L1 regularization does not give higher accuracy but may be slower in training.

Q: Why you don't save a sparse weight vector in the model file?

We don't have any application which really needs this setting. However, please email us if your application must use a sparse weight vector.

L2-regularized Support Vector Regression

Q: Does LIBLINEAR support least-square regression?

Yes. L2-loss SVR with epsilon = 0 (i.e., -p 0) reduces to regularized least-square regression (ridge regression).