Installation and Dataset Formats
To work with the command line interface, firstly
The supported Dataset Formats include:
Then the following modules are available.
Install LibMultiLabel from Source
Environment
Python: 3.8+
CUDA: 11.8, 12.1 (if training neural networks by GPU)
Pytorch 2.0.1+
It is optional but highly recommended to create a virtual environment. For example, you can first refer to the link for the installation guidances of Miniconda and then create a virtual enviroment as follows.
conda create -n LibMultiLabel python=3.8
conda activate LibMultiLabel
Clone LibMultiLabel.
git clone https://github.com/ntumlgroup/LibMultiLabel.git
cd LibMultiLabel
Install the default dependencies with:
pip3 install -r requirements.txt
If you are using neural networks, install additional dependencies with:
pip3 install -r requirements_nn.txt
If you have a different version of CUDA, follow the installation instructions for PyTorch LTS at their website.
Dataset Formats
The input data for building train, test, and validation datasets must have specific formats. For neural networks, the only accepted format is the LibMultiLabel Format. For linear methods, both LibMultiLabel Format and LibSVM Format are accepted. More sample sets in these formats can be downloaded from the LIBSVM data.
LibMultiLabel Format
The LibMultiLabel format is a format for IDs (optional), labels, and raw texts. They are combined in a single file, using tabs and line endings as control characters. It must satisfy the following requirements
one sample per line
ID, labels, and texts are separated by
<TAB>
(the ID column is optional)labels are split by spaces
each field should not contain any
<TAB>
An example with the ID column:
2286<TAB>E11 ECAT M11 M12 MCAT<TAB>recov recov recov recov excit ...
2287<TAB>C24 CCAT<TAB>uruguay uruguay compan compan compan ...
An example without the ID column:
E11 ECAT M11 M12 MCAT<TAB>recov recov recov recov excit ...
C24 CCAT<TAB>uruguay uruguay compan compan compan ...
LibSVM Format
The LibSVM format is a format for labels and sparse numerical features. They are combined in a single file, using commas, spaces, colons and line endings as control characters. It must meet the criteria below
one sample per line
labels and features are separated by a space
labels are split by commas
features are split by spaces
each feature is specified as
index:value
, with index starting from1
Some sample lines are as follows:
1,3,5 1:0.1 9:0.2 13:0.3
2,4,6 2:0.4 10:0.5 14:0.4