LIBSVM Tools: Multi-label classification

Last modified:

This page provides different tools for multi-label classification that are based on LIBSVM or LIBLINEAR. Comments are welcome. Please properly cite our work if you find them useful. This supports our future development. -- Chih-Jen Lin

Disclaimer: We do not take any responsibility on damage or other problems caused by using these software and data sets.

Label combination

One simple way for multi-label classification is to treat each "label set" as a single class and train/test multi-class problems. This approach is suitable for problems without many labels (e.g., a few dozens). The script trans_class.py transforms data to multi-class sets:

Usage: ./trans_class.py training_file [test_file]

"training_file" and "test_file" are the original multi-label sets. The script generates three temporary files: "tmp_train" and "tmp_test" are multi-class sets, and "tmp_class" contains the mapping information.

After training/testing multi-class sets, the script measure.py (you also need subr.py) gives three measures: exact match ratio, microaverage F-measure and macroaverage F-measure.

Usage: ./measure.py test_file test_output_file training_class

In our calculation, when TP=FP=FN=0, F-measure is defined as 0.

Example: (data from LIBSVM data sets)

% trans_class.py rcv1subset_topics_train_2.svm rcv1subset_topics_test_2.svm 
% svm-train -t 0 tmp_train
% svm-predict tmp_test tmp_train.model o
% measure.py rcv1subset_test_2.svm o tmp_class

You may try other multi-class methods available in BSVM.

Author: Wen-Hsien Su

Binary approach (Python and MATLAB/Octave)

This approach extends the one-against-all multi-class method for multi-label classification. For each label, it builds a binary-class problem so instances associated with that label are in one class and the rest are in another class. The script binary.py (you also need subr.py) implements this approach, which is suitable for problems with up to a few thousand labels.

Usage: ./binary.py [parameters for svm-train] training_file test_file

"training_file" and "test_file" are multi-label sets. You need to install LIBSVM and set suitable paths (see variables svmtrain_exe and svmpredict_ete in the script). Modify the variables if you would like to use LIBLINEAR.

After training/testing, binary.py, gives three measures: exact match ratio, microaverage F-measure and macroaverage F-measure. In our calculation, when TP=FP=FN=0, F-measure is defined as 0. For the prediction outcome, it is possible a test instance is not associated with any label.

Example: (data from LIBSVM data sets)

% binary.py -t 0 rcv1subset_train_2.svm rcv1subset_test_2.svm

For MATLAB/Octave, save runbinary.m in the matlab directory of LIBLINEAR. Currently we support only LIBLINEAR instead of LIBSVM. You also need libsvmread_ml.c (see Installation for MATLAB scripts) for reading data in LIBSVM format. The usage is

>> runbinary(training_file, test_file, options);

Note that options indicate LIBSVM/LIBLINEAR options. Example:

>> runbinary('yeast_train.svm', 'yeast_test.svm', '-s 0 -e 0.001 -B 1');

Alternatively you may pass preloaded data in the following format

>> train_data = struct;
>> train_data.x = % matrix with dimension number_of_instances x number_of_features
>> train_data.y = % matrix of 0/1 with dimension number_of_instances x number_of_classes
>> train_data.map = % the label of column i, i.e. y(:, i) correponds to label map(i)
>> test_data = % same as train_data
>> runbinary(train_data, test_data, options);

Authors: Rong-En Fan and Chih-Jen Lin.

Read/write multi-label datasets in LIBSVM format to/from MATLAB/Octave

You need to download files in this directory, and compile this MATLAB-C interface by using the Multilabel_makefile. Type the following command under MATLAB

>> multilabel_make

or use make under unix systems:

$ make -f Multilabel_makefile

To load a set such as 'rcv1train.svm' into MATLAB, first launch MATLAB. Then type:

>> [y, x, map] = libsvmread_ml('rcv1train.svm');

The "y" matrix represents the labels of each instance. y(i,j) is 1 if the i-th instance has the label j, otherwise it is 0. The "x" matrix is the data. The "map" matrix stores the mapping between the internal label j and the label found in the dataset.

To save a set, use, for example,

>> libsvmwrite_ml('rcv1train.svm', y, x, map);

Author: Rong-En Fan. Minor improvement by Chun-Heng Huang, April 2013.

Threshold selection to improve the binary approach

A method to optimize macro-average F-measure was proposed in

David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361-397, 2004.

For a more detailed study of this approach, please see

R.-E. Fan and C.-J. Lin. A Study on Threshold Selection for Multi-label Classification , 2007.

We implement the approach "SVM.1" in the code newbinary.m. The usage is the same as runbinary

>> newbinary(training_file, test_file, options);

>> newbinary(train_data, test_data, options);

You also need libsvmread_ml.c (see Installation for MATLAB scripts) for reading data in LIBSVM format. Note that options indicate LIBSVM/LIBLINEAR options. Example: if we used the standard binary approach, the following result is obtained

>> runbinary('train-exp1.svm', 'test-exp1.svm', '-s 1 -B 1 -q'); 
INFO: microaverage: 0.535837
INFO: macroaverage: 0.045244

By adjusting the threshold, we obtain much better macroaverage.

>> newbinary('train-exp1.svm', 'test-exp1.svm', '-s 1 -B 1 -q');
INFO: microaverage: 0.539980
INFO: macroaverage: 0.207518

For re-producing rcv1 results in past published works, please see this page.

Installation for MATLAB scripts

The MATLAB/Octave interface of LIBLINEAR must be built. Additionally, you need to download files in this directory to the matlab-interface directory matlab/ of LIBLINEAR. To build the MATLAB-C interface, use the following commands in MATLAB

>> make
>> multilabel_make

or use make under unix systems:

$ make
$ make -f Multilabel_makefile

Please contact Chih-Jen Lin for any question.