Last modified: Tue Nov 8 15:24:36 CST 2011
This page provides some miscellaneous tools based on LIBSVM.
Roughly they include
- Things not general enough to be included in LIBSVM
- Research codes used in some our past papers
- Some data sets in LIBSVM formats
They will be less maintained comparing to the main LIBSVM package. However,
comments
are still welcome. Please properly
cite our work if you find them
useful. This supports our future development. --
Chih-Jen Lin
Disclaimer: We do not take any responsibility on damage or other
problems caused by using these software and data sets.
Table of Contents
Large linear classification when data cannot fit in memory
Weights for data instances
Fast training/testing for degree-2 polynomial mappings of data
Cross Validation with Different Criteria (AUC, F-score, etc.)
LIBSVM for dense data
LIBSVM for string data
Multi-label classification
LIBSVM Extensions at Caltech
Feature selection tool
LIBSVM data sets
SVM-toy in 3D
Multi-class classification (and probability output) via error-correcting codes
SVM Multi-class Probability Outputs
An integrated development environment to libsvm
ROC Curve for Binary SVM
Grid Parameter Search for Regression
Radius Margin Bounds for SVM Model Selection
Primal variable w of linear SVM and feature selection
Reduced Support Vector Machines Implementation
LIBSVM for SVDD and finding the smallest sphere containing all data
DAG approach for multiclass classification
Large linear classification when data cannot fit in memory
This is an extension
of LIBLINEAR
for data which cannot fit in memory.
Currently it
supports L2-regularized L1- and L2- loss linear SVM, L2-regularized logistic regression,
and Cramer and Singer formulation for multi-class classification problems.
This code implements methods proposed in the following papers
- Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang, and Chih-Jen Lin.
Large linear classification when data cannot fit in memory, ACM KDD 2010 (Best research paper award).
-
Kai-Wei Chang and Dan Roth.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models, ACM KDD 2011.
Please download the zip file. Details of using this code
are in the README.cdblock file. Except new parameters for
this extension, the usage is
the same as LIBLINEAR.
Authors: Hsiang-Fu Yu and Kai-Wei Chang
Weights for data instances
Users can give a weight to each data instance.
For LIBSVM users, please download the zip file (MATLAB and Python interfaces are included).
For LIBLINEAR users, please download the zip file (MATLAB and Python interfaces are included).
- You must store weights in a separated file and specify -W your_weight_file. This setting is different from earlier versions where weights are in the first column of training data.
- Training/testing sets are the same
as those for standard LIBSVM/LIBLINEAR.
- We do not support
weigts for test data.
- All solvers are supported.
- Matlab/Python interfaces for both LIBSVM/LIBLIENAR are supported.
We are interested in successful stories of using
instance weights. Please keep us informed.
Author: Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu.
Fast training/testing for degree-2 polynomial mappings of data
This is an extension
of LIBLINEAR
for fast training/testing
of the degree-2 polynomial mappings of data.
Currently it
supports L1- and L2-loss linear SVM.
This code implements one method proposed in the paper
Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and
Chih-Jen Lin.
Low-degree Polynomial Mappings of Data for SVM, 2009.
Please download the zip file here. Details of using this code
are in the README.poly2 file. Except new parameters for
the degree-2 mapping, the usage is
the same as LIBLINEAR.
Authors: Yin-Wen Chang, Cho-Jui Hsieh and Kai-Wei Chang
Cross Validation with Different Criteria (AUC, F-score, etc.)
For some unbalanced data sets, accuracy may not be a good criterion
for evaluating a model. This tool enables LIBSVM and
LIBLINEAR to conduct
cross-validation and prediction with respect to different criteria
(F-score, AUC, etc.).
Details
Authors: Hsiang-Fu Yu and Chia-Hua Ho
LIBSVM for dense data
LIBSVM
stores instances as sparse vectors.
For some applications, most feature
values are non-zeros, so using a dense
representation can significantly save
the computational time.
The zip file here
is an implementation for dense data.
See README for some comparisons with
the standard libsvm.
Author: Ming-Fang Weng
LIBSVM for string data
For some applications, data instances
are strings. SVM trains a model
using some string kernels.
This experimental code (download zip
file here)
allows string inputs
and implements one string kernel.
Details are in README.
Author: Guo-Xun Yuan
Multi-label classification
This web page contains
various tools for multi-label classification.
LIBSVM Extensions at Caltech
You can link to this
webpage, which is individually maintained by a PhD student Hsuan-Tien Lin at
Caltech. The page contains some programs that he has developed for
related research. Most of these programs are extended from/for LIBSVM.
Some of the most useful programs include confidence margin/decision
value output, infinite ensemble learning with SVM, dense format, and MATLAB
implementation for estimating posterior probability.
Feature selection tool
This is a simple python script (download here)
to use F-score for selecting features. To run it, please put it in the sub-directory "tools" of LIBSVM.
Usage: ./fselect.py training_file [testing_file]
Output files: .fscore shows importance of features, .select gives the running log, and .pred gives testing results.
More information about this implementation can be found
in Y.-W. Chen and C.-J. Lin,
Combining SVMs with various feature selection strategies.
To appear in the book
"Feature extraction, foundations and applications." 2005.
This implementation is still preliminary. More comments
are very welcome.
Author: Yi-Wei Chen
LIBSVM data sets
We now have a nice web page
showing available data sets.
SVM-toy in 3D
A simple applet demonstrating SVM classification and regression in 3D. It extends the java svm-toy in the
LIBSVM
package.
Go to 3D SVM-toy page
Multi-class classification (and probability output) via error-correcting codes
Note: libsvm does support multi-class classification.
The code here implements some extensions for
experimental purposes.
This code implements multi-class classification
and probability estimates
using 4 types of error correcting codes.
Details of the 4 types of ECCs and the algorithms
can be found in the following paper:
T.-K. Huang,
R. C. Weng,
and
C.-J. Lin.
Generalized Bradley-Terry Models and Multi-class Probability Estimates.
Journal
of Machine Learning Research, 7(2006), 85-115.
A (very) short version of this paper appears in
NIPS 2004.
The code can be downloaded
here.
The installation is the same as the standard LIBSVM package, and different
types of ECCs are specified as the "-i" option. Type "svm-train" without
any arguments to see the usage. Note that both
"one-againse-one" and "one-against-the rest"
multi-class strategies are part of the implementation.
If you specify -b in training and testing, you get
probability estimates and the predicted label is the
one with the largest value.
If you do not specify -b, this is classification
based on decision values. Now we use the "exponential-loss"
method in the paper:
Allwein et al.:
Reducing multiclass to binary: a unifying approach for margin
classifiers.
Journal of Machine Learning Research, 1:113--141, 2001,
to predict class label. For one-against-the rest
(or called 1vsall), this is the same as the commonly
used way
argmax_{i} (decision value of ith class vs the rest).
For one-against-one, it is different from the
max-win strategy used in libsvm.
MATLAB code for experiments in our paper is available
here
Author: Tzu-Kuo Huang
SVM Multi-class Probability Outputs
This code implements different strategies
for multi-class probability estimates
from in the following paper
T.-F. Wu,
C.-J. Lin, and
R. C. Weng.
Probability Estimates for Multi-class Classification by Pairwise Coupling.
Journal of Machine Learning Research, 2004. A short version appears in NIPS 2003.
After libsvm 2.6, it already includes
one of the methods here. You may directly use the
standard libsvm unless you are interested in doing
comparisons.
Please download the tgz file here.
The data used in the paper is available
here.
Please then check README for installation.
Matlab programs for the synthetic data experiment
in the paper can be found in this directory. The main program is fig1a.m
Author: Tingfan Wu (svm [at] future.csie.org)
An integrated development environment to libsvm
This is a graphical environment for doing experiments with libsvm.
You can create and connect components (like scaler, trainer,
predictor, etc) in this environment. The program can be extended
easily by writing more "plugins". It was written in python and
uses wxPython library.
Please download the zip file here.
After unzip the package, run the file wxApp1.py.
You then have to give the path of libsvm binary
files in plugin/svm/svm_interface.py.
Author: Chih-Chung Chang
ROC Curve for Binary SVM
This tool which gives the ROC (Receiver Operating Characteristic) curve and AUC (Area Under Curve)
by ranking the decision values.
Note that we assume labels are +1 and -1.
Multi-class is not supported yet.
You can use either MATLAB or Python.
If using MATLAB, you need to
- Download LIBSVM MATLAB interface from LIBSVM page and build it.
- Download plotroc.m to the main directory of LIBSVM MALTAB interface.
- Type
> help plotroc
to get usage and examples.
If using Python, you need to
- Download LIBSVM (version 2.91 or after) and make the LIBSVM python interface.
- Download plotroc.py to the python directory.
- Edit the path of gnuplot in plotroc.py in necessary.
- The usage is
plotroc.py [-v cv_fold | -T testing_file] [libsvm_options] training_file
- Example:
> plotroc.py -v 5 -c 10 ../heart_scale
If there is no test data,
"validated decision values"
from cross-validation on the training data are used.
Otherwise, we consider decision values of testing data
using the model from the training data (without
cross-validation).
To use LIBLINEAR, you need the following modifications
- MATLAB: Copy plotroc.m to the matlab directory
(note that matlab interface is included in
LIBLINEAR). Replace svmtrain and svmpredict with train and predict, respectively.
Authors: Tingfan Wu (svm [at] future.csie.org), Chien-Chih Wang (d98922007 [at] ntu.edu.tw), and Hsiang-Fu Yu
Grid Parameter Search for Regression
This file is a slight modification of grid.py in the "tools" directory
of libsvm.
In addition to parameters C, gamma in classification,
it searches for
epsilon as well.
Usage: grid.py [-log2c begin,end,step] [-log2g begin,end,step] [-log2p begin,end,step] [-v fold]
[-svmtrain pathname] [-gnuplot pathname] [-out pathname] [-png pathname]
[additional parameters for svm-train] dataset
Author: Hsuan-Tien Lin (initial modification); Tzu-Kuo Huang (the parameter epsilon).
Radius Margin Bounds for SVM Model Selection
This is the code used in the paper:
K.-M. Chung, W.-C. Kao,
T. Sun, L.-L. Wang,
and
C.-J. Lin.
Radius Margin Bounds for Support Vector Machines with the RBF Kernel.
Please download the tar.bz2 file here.
Details of using this code are in the readme.txt file.
Part of the optimization subroutines written in Python were
based on the module by Travis E. Oliphant.
Author: Wei-Chun Kao with the help from Leland Wang, Kai-Min Chung, and Tony Sun
Primal variable w of linear SVM and feature selection
In the following
directory
there are two files.
svm-weight.cpp calculates the primal variable w using a model trained
by libsvm (multi-class supported).
Note that this program is for LINEAR SVM only!
The output is a file containing the decison functions.
If the data has k classes, the decision functions of all 1vs1 sub-problems
are placed in the order 1 vs 2, ..., 1 vs k, 2 vs 3, ..., k-1 vs k.
The file linear-feasel.cpp conducts feature selection by considering
indices with larger components of w.
Please use the makefile in the same directory to build them.
Note that this file works for two-class problems only.
Author: Tzu-Kuo Huang
Reduced Support Vector Machines Implementation
This is the code used in the paper:
K.-M. Lin and C.-J. Lin.
A study on reduced support vector machines.
IEEE Transactions on Neural Networks, 2003.
Please download the .tgz file here.
After making the binary files, type svm-train to see
the usage. It includes different methods to implement
RSVM.
To speed up the code, you may want to link
the code to optimized BLAS/LAPACK or ATLAS.
Author: Kuan-Min Lin
LIBSVM for SVDD and finding the smallest sphere containing all data
SVDD is another type of one-class SVM. We implement the formulation in
Tax and Duin, Support Vector Data Description, Machine Learning, vol. 54, 2004, 45-66. Please download this zip file, put sources into libsvm-3.1 (available
here), and make the code. The options are
-
-s 5 SVDD
-
-s 6 gives the square of the radius for L1-SVM
-
-s 7 gives the square of the radius for L2-SVM
MATLAB interface is supported; see the
matlab sub-directory.
Authors: Leland Wang, Holger Froehlich (University of Tuebingen), Konrad Rieck (Fraunhofer institute), Chen-Tse Tsai, Tse-Ju Lin
DAG approach for multiclass classification
In svm.cpp, please replace the following lines
in the subtoutine svm_predict()
double pred_result = svm_predict_values(model, x, dec_values);
free(dec_values);
return pred_result;
with this segment of code.
This follows from the code used in the paper:
C.-W. Hsu and C.-J. Lin.
A comparison of methods
for multi-class support vector machines
,
IEEE Transactions on Neural Networks, 13(2002), 415-425.
Author: Chih-Wei Hsu
Please contact Chih-Jen Lin for any question.