LIBLINEAR FAQ

Last modified: Wed May 14 09:26:57 CST 2008
Q: When to use LIBLINEAR but not LIBSVM?

Please check our explanation on the LIBLINEAR webpage.


Q: Where can I find documents of liblinear?

In the package there is a README file which details all options, data format, and library calls. Please also check the appendix of our SVM guide about when and how to solve linear SVMs.

You can find implementation details in the following two papers

C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. ICML 2008.
C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region Newton method for large-scale logistic regression. Journal of Machine Learning Research 9(2008), 627--650.


Q: LIBLINEAR is slow for my data?

Very likely you use a large C or don't scale data. If your number of features is small, you may use the option

-s 0
by solving the primal problem. More examples are in the appendix of our SVM guide.
Q: Where are change log and earlier versions?

See the change log. You can download earlier versions here.


Q: How to select the regularization parameter C?

You can use grid.py of libsvm to check cross validation accuracy of different C.

First, you need to modify three places from

        cmdline = '%s -c %s -g %s -v %s %s %s' % \
          (svmtrain_exe,c,g,fold,pass_through_string,dataset_pathname)
to
        cmdline = '%s -c %s -v %s %s %s' % \
          (svmtrain_exe,c,fold,pass_through_string,dataset_pathname)
Note that these three places are similar but slightly different.

Second, run

> grid.py -log2c -3,0,1 -log2g 1,1,1 -svmtrain ./train
to check CV values at C=2^-3, 2^-2, 2^-1, and 2^0
Q: Why in some situations the software seems to be slower than that used for the paper?

We guess that you are comparing

> time ./train -v 5 -e 0.001 data
with the environment used in our paper, and find that liblinear is slower. Two reasons may cause the diffierence.
  1. The above timeing of liblinear includes time for reading data, but in the paper we exclude that part.
  2. In the paper, to conduct 5-fold (or 2-fold) CV we group folds used for training as a separate matrix, but liblinear simply uses pointers of the corresponding instances. Therefore, in doing matrix-vector multiplications, the former sequentially uses rows in a continuous segment of the memory, but the latter does not. Thus, liblinear may be slower but it saves the memory.

Q: Why in linear.cpp you don't call log1p for log(1+...)? Also gradient/Hessian calculation may involve catastrophic cancellations?

We carefully studied such issues, and decided to use the current setting. For data classification, one doesn't need very accurate solution, so numerical issues are less important. Moreover, log1p is not available on all platforms. Please let us know if you observe any numerical problems.


Q: Is it important to normalize each instance?

For document classification, our experience indicates that if you normalize each document to unit length, then not only the training time is shorter, but also the performance is better.


Q: Can you explain more about the model file?

Assume k is the total number of classes and n is the number of features. In the model file, after the parameters, there is an n*k matrix W, whose columns are obtained from solving two-class problems: 1 vs rest, 2 vs rest, 3 vs rest, ...., k vs rest. For example, if there are 4 classes, the file looks like:

+-------+-------+-------+-------+
| w_1vR | w_2vR | w_3vR | w_4vR |
+-------+-------+-------+-------+

Q: Why the sign of predicted labels and decision values are sometimes reversed?

Please see the answer in libsvm faq.

To correctly obtain decision values, you need to check the array

label
in the model.
Please contact Chih-Jen Lin for any question.