This page provides different tools for multi-label classification that are based on LIBSVM or LIBLINEAR. Comments are welcome. Please properly cite our work if you find them useful. This supports our future development. -- Chih-Jen Lin
Disclaimer: We do not take any responsibility on damage or other problems caused by using these software and data sets.
Usage: ./trans_class.py training_file [test_file]"training_file" and "test_file" are the original multi-label sets. The script generates three temporary files: "tmp_train" and "tmp_test" are multi-class sets, and "tmp_class" contains the mapping information.
After training/testing multi-class sets, the script measure.py (you also need subr.py) gives three measures: exact match ratio, microaverage F-measure and macroaverage F-measure.
Usage: ./measure.py test_file test_output_file training_classIn our calculation, when TP=FP=FN=0, F-measure is defined as 0.
Example: (data from LIBSVM data sets)
% trans_class.py rcv1subset_topics_train_2.svm rcv1subset_topics_test_2.svm % svm-train -t 0 tmp_train % svm-predict tmp_test tmp_train.model o % measure.py rcv1subset_test_2.svm o tmp_classYou may try other multi-class methods available in BSVM.
Author: Wen-Hsien Su
This approach extends the one-against-all multi-class method for multi-label classification. For each label, it builds a binary-class problem so instances associated with that label are in one class and the rest are in another class. The script binary.py (you also need subr.py) implements this approach, which is suitable for problems with up to a few thousand labels.
Usage: ./binary.py [parameters for svm-train] training_file test_file"training_file" and "test_file" are multi-label sets. You need to install LIBSVM and set suitable paths (see variables svmtrain_exe and svmpredict_ete in the script). Modify the variables if you would like to use LIBLINEAR.
After training/testing, binary.py, gives three measures: exact match ratio, microaverage F-measure and macroaverage F-measure. In our calculation, when TP=FP=FN=0, F-measure is defined as 0. For the prediction outcome, it is possible a test instance is not associated with any label.
Example: (data from LIBSVM data sets)
% binary.py -t 0 rcv1subset_train_2.svm rcv1subset_test_2.svm
For MATLAB/Octave, save runbinary.m in the matlab directory of LIBLINEAR. Currently we support only LIBLINEAR instead of LIBSVM. You also need libsvmread_ml.c (see Installation for MATLAB scripts) for reading data in LIBSVM format. The usage is
>> runbinary(training_file, test_file, options);Note that options indicate LIBSVM/LIBLINEAR options. Example:
>> runbinary('yeast_train.svm', 'yeast_test.svm', '-s 0 -e 0.001 -B 1');Alternatively you may pass preloaded data in the following format
>> train_data = struct; >> train_data.x = % matrix with dimension number_of_instances x number_of_features >> train_data.y = % matrix of 0/1 with dimension number_of_instances x number_of_classes >> train_data.map = % the label of column i, i.e. y(:, i) correponds to label map(i) >> test_data = % same as train_data >> runbinary(train_data, test_data, options);
Authors: Rong-En Fan and Chih-Jen Lin.
You need to download files in this directory, and compile this MATLAB-C interface by using the Multilabel_makefile. Type the following command under MATLAB
>> multilabel_makeor use make under unix systems:
$ make -f Multilabel_makefile
To load a set such as 'rcv1train.svm' into MATLAB, first launch MATLAB. Then type:
>> [y, x, map] = libsvmread_ml('rcv1train.svm');The "y" matrix represents the labels of each instance. y(i,j) is 1 if the i-th instance has the label j, otherwise it is 0. The "x" matrix is the data. The "map" matrix stores the mapping between the internal label j and the label found in the dataset.
To save a set, use, for example,
>> libsvmwrite_ml('rcv1train.svm', y, x, map);
Author: Rong-En Fan. Minor improvement by Chun-Heng Huang, April 2013.
A method to optimize macro-average F-measure was proposed in
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361-397, 2004.
For a more detailed study of this approach, please seeR.-E. Fan and C.-J. Lin. A Study on Threshold Selection for Multi-label Classification , 2007.
We implement the approach "SVM.1" in the code newbinary.m. The usage is the same as runbinary>> newbinary(training_file, test_file, options);Or
>> newbinary(train_data, test_data, options);You also need libsvmread_ml.c (see Installation for MATLAB scripts) for reading data in LIBSVM format. Note that options indicate LIBSVM/LIBLINEAR options. Example: if we used the standard binary approach, the following result is obtained
>> runbinary('train-exp1.svm', 'test-exp1.svm', '-s 1 -B 1 -q'); INFO: microaverage: 0.535837 INFO: macroaverage: 0.045244By adjusting the threshold, we obtain much better macroaverage.
>> newbinary('train-exp1.svm', 'test-exp1.svm', '-s 1 -B 1 -q'); INFO: microaverage: 0.539980 INFO: macroaverage: 0.207518For re-producing rcv1 results in past published works, please see this page.
The MATLAB/Octave interface of LIBLINEAR must be built. Additionally, you need to download files in this directory to the matlab-interface directory matlab/ of LIBLINEAR. To build the MATLAB-C interface, use the following commands in MATLAB
>> make >> multilabel_makeor use make under unix systems:
$ make $ make -f Multilabel_makefile