Training and Prediction for Linear Classifiers ============================================== For a step-by-step tutorial, see - :ref:`cli-quickstart` For the documentation on some commonly used command line flags, see - :ref:`linear_train` - :ref:`linear_predict` For the complete set of command line flags, see - `Command Line Options `_ ------------------------------------------------------------------- .. _cli-quickstart: Using CLI via an Example ^^^^^^^^^^^^^^^^^^^^^^^^ Step 1. Data Preparation ------------------------ Create a data sub-directory within LibMultiLabel and go to this sub-directory. .. code-block:: bash mkdir -p data/rcv1 cd data/rcv1 Linear methods take either textual or bag-of-words numeric data as inputs. For this example, the data will be in :ref:`libmultilabel-format`, a textual data format. Download and uncompress the RCV1 dataset with .. code-block:: bash wget -O train.txt.bz2 https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel/rcv1_topics_train.txt.bz2 wget -O test.txt.bz2 https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel/rcv1_topics_test.txt.bz2 bzip2 -d *.bz2 Browse an instance of the data with .. code-block:: bash head -n 1 train.txt # Output: 2286 E11 ECAT M11 M12 MCAT recov recov recov recov excit excit bring mexic mexic [...] If you want to use numeric data in :ref:`libsvm-format` instead, you may do so with .. code-block:: wget -O train.svm.bz2 https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel/rcv1_topics_train.svm.bz2 wget -O test.svm.bz2 https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel/rcv1_topics_combined_test.svm.bz2 bzip2 -d *.bz2 head -n 1 train.svm # Output: 34,59,93,94,102 864:0.0497399253756197 1523:0.044664135988103 1681:0.0673871572152868 [...] See `Dataset Formats `_ for more details on the data formats. Step 2. Training and Prediction via an Example ---------------------------------------------- Next, move back to the root directory and run the main script .. code-block:: bash cd ../.. python3 main.py --config example_config/rcv1/l2svm.yml This trains a L2-regularized L2-loss SVM and evaluates the model on the test set. ---------------------------------------------- .. _linear_train: Training and (Optional) Prediction ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To train and evaluate a model, use .. code-block:: bash python3 main.py --config CONFIG_PATH \ --training_file TRAINING_DATA_PATH \ --test_file TEST_DATA_PATH \ --linear \ --liblinear_options=LIBLINEAR_OPTIONS \ --linear_technique MULTILABEL_OR_MULTICLASS_TECHNIQUE \ --data_format DATA_FORMAT - **config**: Path to a configuration file. Command line options may be specified here instead. See `Command Line Options `_ for more details. The linear classifiers are based on `LIBLINEAR `_, and its options may be specified. - **training_file**: The path to training data. - **test_file**: The path to test data. If test data is available, also evaluates the trained model on the test data. - **linear**: This option specifies that linear models should be ran, as opposed to running neural network models. - **liblinear_options**: An `option string for LIBLINEAR `_. For example .. code-block:: bash --liblinear_options='-s 2 -B 1 -e 0.0001 -q' - **linear_technique**: An option for multi-label or multi-class techniques. It should be one of: ``1vsrest`` (one-vs-rest), ``thresholding`` (thresholding), ``cost_sensitive`` (cost-sensitive), and ``binary_and_multiclass`` (binary_and_multiclass). - **data_format**: The data format. It should be one of ``txt`` (LibMultiLabel format), ``svm`` (LibSVM format). See `Dataset Formats `_ for more details on accepted data formats. .. _linear_predict: Prediction ^^^^^^^^^^ To predict a test set by applying a previously trained model, use .. code-block:: bash python3 main.py --config CONFIG_PATH \ --test_file TEST_DATA_PATH \ --eval \ --linear \ --data_format DATA_FORMAT \ --checkpoint_path CHECKPOINT_PATH where ``CHECKPOINT_PATH`` is a path to a ``linear_pipeline.pickle``.