LIBLINEAR FAQ

Last modified: Tue Oct 20 20:41:14 CST 2009

Some questions are listed in LIBSVM FAQ.


Table of Contents

Introduction, Installation, and Documents
Data
Training and Prediction
L1-regularized Classification


Introduction, Installation, and Documents

Q: When to use LIBLINEAR but not LIBSVM?

Please check our explanation on the LIBLINEAR webpage. Also see appendix B of our SVM guide.


Q: Where can I find documents of LIBLINEAR?

Please see the descriptions at LIBLINEAR page.


Q: I would like to cite LIBLINEAR. Which paper should I cite?

Please cite the following paper:

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A Library for Large Linear Classification, Journal of Machine Learning Research 9(2008), 1871-1874. Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear

The bibtex format is

@Article{REF08a,
  author = 	 {Rong-En Fan and Kai-Wei Chang and Cho-Jui Hsieh and Xiang-Rui Wang and Chih-Jen Lin},
  title = 	 {{LIBLINEAR}: A Library for Large Linear Classification},
  journal = 	 {Journal of Machine Learning Research},
  year = 	 {2008},
  volume =	 {9},
  pages =	 {1871--1874}
}

Q: Where are change log and earlier versions?

See the change log and directory for earlier versions.


Q: How do I choose the solver? Should I use logistic regression or linear SVM? How about L1/L2 regularization?

Generally we recommend linear SVM as its training is faster and the accuracy is competitive. However, if you would like to have probability outputs, you may consider logistic regression.

Moreover, try L2 regularization first unless you need a sparse model. For most cases, L1 regularization does not give higher accuracy but may be slightly slower in training.

Among L2-regularized SVM solvers, try the default one (L2-loss SVC dual) first. If it is too slow, use the option -s 2 to solve the primal problem.


Data

Q: Is it important to normalize each instance?

For document classification, our experience indicates that if you normalize each document to unit length, then not only the training time is shorter, but also the performance is better.


Q: How could I use MATLAB/OCTAVE interface for fast dataload?

If you need to read the same data set several times, saving data in MATLAB/OCTAVE binary formats can significantly reduce the loading time. The following MATLAB code generates a binary file rcv1_test.mat:

[rcv1_test_labels,rcv1_test_inst] = libsvmread('../rcv1_test.binary');
save rcv1_test.mat rcv1_test_labels rcv1_test_inst;
For OCTAVE user, use
save -mat7-binary rcv1_test.mat rcv1_test_labels rcv1_test_inst;
to save rcv1_test.mat in MATLAB 7 binary format. (Or you can use -binary to save in OCTAVE binary format) Then, type
load rcv1_test.mat
to read data. A simple experiment shows that read_sparse takes 88 seconds to read a data set rcv1 with half million instances, but it costs only 7 seconds to load the MATLAB binary file. Please type
help save
in MATLAB/OCTAVE for further information.

Training and Prediction

Q: LIBLINEAR is slow for my data (reaching the maximal number of iterations)?

Very likely you use a large C or don't scale data. If your number of features is small, you may use the option

-s 2
by solving the primal problem. More examples are in the appendix B of our
SVM guide.
Q: How to select the regularization parameter C?

You can use grid.py of LIBSVM to check cross validation accuracy of different C.

First, you need to modify three places from

        cmdline = '%s -c %s -g %s -v %s %s %s' % \
          (svmtrain_exe,c,g,fold,pass_through_string,dataset_pathname)
to
        cmdline = '%s -c %s -v %s %s %s' % \
          (svmtrain_exe,c,fold,pass_through_string,dataset_pathname)
Note that these three places are similar but slightly different.

Second, run

> grid.py -log2c -3,0,1 -log2g 1,1,1 -svmtrain ./train
to check CV values at C=2^-3, 2^-2, 2^-1, and 2^0
Q: Why in some situations the software seems to be slower than that used in the JMLR paper (logistic regression)?

We guess that you are comparing

> time ./train -s 0 -v 5 -e 0.001 data
with the environment used in our paper, and find that LIBLINEAR is slower. Two reasons may cause the diffierence.
  1. The above timeing of LIBLINEAR includes time for reading data, but in the paper we exclude that part.
  2. In the paper, to conduct 5-fold (or 2-fold) CV we group folds used for training as a separate matrix, but LIBLINEAR simply uses pointers of the corresponding instances. Therefore, in doing matrix-vector multiplications, the former sequentially uses rows in a continuous segment of the memory, but the latter does not. Thus, LIBLINEAR may be slower but it saves the memory.

Q: Why in linear.cpp you don't call log1p for log(1+...)? Also gradient/Hessian calculation may involve catastrophic cancellations?

We carefully studied such issues, and decided to use the current setting. For data classification, one doesn't need very accurate solution, so numerical issues are less important. Moreover, log1p is not available on all platforms. Please let us know if you observe any numerical problems.


Q: Can you explain more about the model file?

Assume k is the total number of classes and n is the number of features. In the model file, after the parameters, there is an n*k matrix W, whose columns are obtained from solving two-class problems: 1 vs rest, 2 vs rest, 3 vs rest, ...., k vs rest. For example, if there are 4 classes, the file looks like:

+-------+-------+-------+-------+
| w_1vR | w_2vR | w_3vR | w_4vR |
+-------+-------+-------+-------+

Q: Why the sign of predicted labels and decision values are sometimes reversed?

Please see the answer in LIBSVM faq.

To correctly obtain decision values, you need to check the array

label
in the model.
Q: Why you support probability outputs for logistic regression only?

LIBSVM uses more advanced techniques for SVM probability outputs. We don't know yet if they should be included in LIBLINEAR.

If you really would like to have probability outputs for SVM in LIBLINEAR, you can consider using the simple probability model of logistic regression. Simply remove the following if statament in the subroutine predict_probability in linear.cpp.

int predict_probability(const struct model *model_, const struct feature_node *x, double* prob_estimates)
{
	if(model_->param.solver_type==L2R_LR)
	{

L1-regularized Classification

Q: When should I use L1-regularized classifiers?

If you would like to identify important features. For most cases, L1 regularization does not give higher accuracy but may be slower in training.

We hope to know situations where L1 is useful. Please contact us if you have some success stories.


Q: Why you don't save a sparse weight vector in the model file?

We don't have any application which really needs this setting. However, please email us if your application must use a sparse weight vector.


Please contact
Chih-Jen Lin for any question.