Installation and Dataset Formats

To work with the command line interface, firstly

Install LibMultiLabel from Source

The supported Dataset Formats include:

LibMultiLabel Format

LibSVM Format

Then the following modules are available.

Training and Prediction for Linear Classifiers

Training, Prediction, and Hyperparameter Search for Neural Networks

Install LibMultiLabel from Source

Environment
- Python: 3.10+
- CUDA: 11.8, 12.1 (if training neural networks by GPU)
- Pytorch 2.0.1+

It is optional but highly recommended to create a virtual environment. For example, you can first refer to the link for the installation guidances of Miniconda and then create a virtual enviroment as follows.

conda create -n LibMultiLabel python=3.10
conda activate LibMultiLabel

Clone LibMultiLabel.

git clone https://github.com/ntumlgroup/LibMultiLabel.git
cd LibMultiLabel

Install the default dependencies with:

pip3 install -r requirements.txt

If you are using neural networks, install additional dependencies with:

pip3 install -r requirements_nn.txt

If you have a different version of CUDA, follow the installation instructions for PyTorch LTS at their website.

Dataset Formats

The input data for building train, test, and validation datasets must have specific formats. For neural networks, the only accepted format is the LibMultiLabel Format. For linear methods, both LibMultiLabel Format and LibSVM Format are accepted. More sample sets in these formats can be downloaded from the LIBSVM data.

LibMultiLabel Format

The LibMultiLabel format is a format for IDs (optional), labels, and raw texts. They are combined in a single file, using tabs and line endings as control characters. It must satisfy the following requirements

one sample per line
ID, labels, and texts are separated by <TAB> (the ID column is optional)
labels are split by spaces
each field should not contain any <TAB>

An example with the ID column:

2286<TAB>E11 ECAT M11 M12 MCAT<TAB>recov recov recov recov excit ...
2287<TAB>C24 CCAT<TAB>uruguay uruguay compan compan compan ...

An example without the ID column:

E11 ECAT M11 M12 MCAT<TAB>recov recov recov recov excit ...
C24 CCAT<TAB>uruguay uruguay compan compan compan ...

LibSVM Format

The LibSVM format is a format for labels and sparse numerical features. They are combined in a single file, using commas, spaces, colons and line endings as control characters. It must meet the criteria below

one sample per line
labels and features are separated by a space
labels are split by commas
features are split by spaces
each feature is specified as index:value, with index starting from 1

Some sample lines are as follows:

1,3,5 1:0.1 9:0.2 13:0.3
2,4,6 2:0.4 10:0.5 14:0.4