Distributed LIBLINEAR: Libraries for Large-scale Linear Classification on Distributed Environments

Machine Learning Group at National Taiwan University
Contributors

We now support

The development of distributed LIBLINEAR is still in its early stage. Your comments are very welcome.


Introduction

MPI LIBLINEAR is an extension of LIBLINEAR on distributed environments. The usage and the data format are the same as LIBLINEAR. Currently only four solvers are supported:

NOTICE: This extension can only run on Unix-like systems. Python and Matlab interfaces are not supported.

Spark LIBLINEAR is a Spark implementation based on LIBLINEAR and integrated with Hadoop distributed file system. This package is developed using Scala. Currently it supports only two solvers:

  • L2-regularized logistic regression (primal)
  • L2-regularized L2-loss linear SVM (primal)

  • Download

    MPI LIBLINEAR can be obtained by downloading the zip file

    Spark LIBLINEAR can be obtained by downloading the zip file or tar.gz file.

    Please read the COPYRIGHT notice before using MPI LIBLINEAR and Spark LIBLINEAR.


    MPI LIBLINEAR Documentation

    Technical details are in the following papers.

    1. Y. Zhuang, W.-S. Chin, Y.-C. Juan, and C.-J. Lin. Distributed Newton Method for Regularized Logistic Regression, PAKDD 2015.
    2. C.-P. Lee, and D. Roth. Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM , ICML 2015.
    3. C.-P. Lee, P.-W. Wang, W. Chen, and C.-J. Lin. Limited-memory common-directions method for distributed optimization and its application on empirical risk minimization . Technical report, 2016. Supplementary materials.
    For MPI LIBLINEAR users, we provide two guides for establishing distributed environments on VirtualBox and Amazon EC2.

    Spark LIBLINEAR Documentation

    Technical details are in the following paper.

    C.-Y. Lin, C.-H. Tsai, C.-P. Lee, and C.-J. Lin. Large-scale Logistic Regression and Linear Support Vector Machines Using Spark, IEEE International Conference on Big Data 2014 (supplementary materials).

    For Spark LIBLINEAR users, we provide a guide for building distributed environments on VirtualBox.

    For users who want to run Spark on Amazon EC2, please check a useful guide on Running Spark on EC2 to build the environment. It automatically sets up Spark, Shark and HDFS on the cluster for you.

    If you already have one Spark cluster, please check the running guide.

    For implementation API, you can check the following document

    Please send comments and suggestions to Chih-Jen Lin.