LIBSVM Data: Multi-label Classification
Recently multi-label classification has been an important topic. Currently there are very few publicly available data sets. We tried hard to collect the following sets. Labels are in the beginning of each line and separated by commas.
mediamill (exp1)
- Source:
Mediamill
/ The Mediamill Challenge Problem
- Preprocessing:
We combine all binary classification problems into a multi-label one.
- # of classes: 101
- # of data:
30,993
/ 12,914 (testing)
- # of features:
120
- Files:
rcv1v2 (topics; subsets)
- Source:
[DL04b]
- # of classes: 101
- # of data:
3,000
/ 3,000 (testing)
- # of features:
47,236
- Files:
rcv1v2 (topics; full sets)
- Source:
[DL04b]
- Preprocessing:
The four testing sets corrspond to the four testing files from the RCV1 site. In the testing set, the number of classes is 103.
- # of classes: 101
- # of data:
23,149
/ 781,265 (testing)
- # of features:
47,236
- Files:
rcv1v2 (industries; full sets)
- Source:
[DL04b]
- Preprocessing:
The four testing sets corrspond to the four testing files from the RCV1 site. In the testing set, the number of classes is 350.
- # of classes: 313
- # of data:
23,149
/ 781,265 (testing)
- # of features:
47,236
- Files:
rcv1v2 (regions; full sets)
- Source:
[DL04b]
- Preprocessing:
The four testing sets corrspond to the four testing files from the RCV1 site. In the testing set, the number of classes is 296.
- # of classes: 228
- # of data:
23,149
/ 781,265 (testing)
- # of features:
47,236
- Files:
scene-classification
- Source:
[MB04a]
- # of classes: 6
- # of data:
1,211
/ 1,196 (testing)
- # of features:
294
- Files:
siam-competition2007
- Source:
SIAM Text Mining Competition 2007
/ SIAM Text Mining Competition 2007
- Preprocessing:
We remove "." before transforming data to vectors. We use
binary term frequencies and normalize each instance to unit
length.
- # of classes: 22
- # of data:
21,519
/ 7,077 (testing)
- # of features:
30,438
- Files:
yeast
- Source:
[AE02a]
- # of classes: 14
- # of data:
1,500
/ 917 (testing)
- # of features:
103
- Files: