Large Linear Classification When Data Cannot Fit In Memory

Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang, and Chih-Jen Lin, in TKDD, 2012.

Abstract

Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. As data cannot fit in memory, many design considerations are very different from those for traditional algorithms. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.

Source Code

Bib Entry

@inproceedings{yu2010large,
  author = {Yu, Hsiang-Fu and Hsieh, Cho-Jui and Chang, Kai-Wei and Lin, Chih-Jen},
  title = {Large Linear Classification When Data Cannot Fit In Memory},
  booktitle = {TKDD},
  year = {2012}
}

Related Publications

Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models, KDD, 2011
Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models, JMLR, 2010
A Comparison of Optimization Methods and software for Large-scale L1-regularized Linear Classification, JMLR, 2010
Training and Testing Low-degree Polynomial Data Mappings via Linear SVM, JMLR, 2010
A Sequential Dual Method for Large Scale Multi-Class Linear SVMs, KDD, 2008
A Dual Coordinate Descent Method for Large-Scale Linear SVM, ICML, 2008
Coordinate Descent Method for Large-scale L2-loss Linear SVM, JMLR, 2008
LIBLINEAR: A Library for Large Linear Classification, JMLR, 2008

Large Linear Classification When Data Cannot Fit In Memory