Distributed LIBLINEAR: Libraries for Large-scale Linear Classification on Distributed Environments

Machine Learning Group at National Taiwan University
Contributors

We now support

The development of distributed LIBLINEAR is still in its early stage. Your comments are very welcome.


Introduction

MPI LIBLINEAR is an extension of LIBLINEAR on distributed environments. The usage and the data format are the same as LIBLINEAR. Currently seven solvers are supported:

NOTICE: This extension can only run on Unix-like systems. Python and Matlab interfaces are not supported.

Spark LIBLINEAR is a Spark implementation based on LIBLINEAR and integrated with Hadoop distributed file system. This package is developed using Scala. Currently it supports only two solvers:


Download

MPI LIBLINEAR can be obtained by downloading the zip file

Spark LIBLINEAR can be obtained by downloading the zip file or tar.gz file.

Please read the COPYRIGHT notice before using MPI LIBLINEAR and Spark LIBLINEAR.


MPI LIBLINEAR Documentation

For users who are interested in running MPI LIBLIEAR, we provide a practical guide of setting up its distributed environment. You may also check our FAQ for MPI LIBLINEAR if you encounter any problems.

Technical details are in the following papers.

  1. Y. Zhuang, W.-S. Chin, Y.-C. Juan, and C.-J. Lin. Distributed Newton Method for Regularized Logistic Regression, PAKDD 2015.
  2. C.-p. Lee, and K.-W Chang. Distributed Block-diagonal Approximation Methods for Regularized Empirical Risk Minimization, MLJ 2020. (Supersedes the ICML 2015 version.)
  3. C.-p. Lee, P.-W. Wang, W. Chen, and C.-J. Lin. Limited-memory common-directions method for large-scale optimization: convergence, parallelization, and distributed optimization, technical report, 2020. (Supersedes the SDM 2017 version.)
  4. W.-L. Chiang, Y.-S. Li, C.-p. Lee, and C.-J. Lin. Limited-memory Common-directions Method for Distributed L1-regularized Linear Classification , SIAM International Conference on Data Mining, 2018. Supplementary materials.

Spark LIBLINEAR Documentation

Technical details are in the following paper.

C.-Y. Lin, C.-H. Tsai, C.-P. Lee, and C.-J. Lin. Large-scale Logistic Regression and Linear Support Vector Machines Using Spark, IEEE International Conference on Big Data 2014 (supplementary materials).

For Spark LIBLINEAR users, we provide a guide for building distributed environments on VirtualBox.

For users who want to run Spark on Amazon EC2, please check a useful guide on Running Spark on EC2 to build the environment. It automatically sets up Spark, Shark and HDFS on the cluster for you.

If you already have one Spark cluster, please check the running guide.

For implementation API, you can check the following document

Please send comments and suggestions to Chih-Jen Lin.