IllinoisSL: A JAVA Library for Structured Prediction

Kai-Wei Chang, Shyam Upadhyay, Ming-Wei Chang, Vivek Srikumar, and Dan Roth, in Arxiv, 2015.

Download the full text

Abstract

Training a structured prediction model involves performing several loss-augmented inference steps. Over the lifetime of the training, many of these inference problems, although different, share the same solution. We propose AI-DCD, an Amortized Inference framework for Dual Coordinate Descent method, an approximate learning algorithm, that accelerates the training process by exploiting this redundancy of solutions, without compromising the performance of the model. We show the efficacy of our method by training a structured SVM using dual coordinate descent for an entity-relation extraction task. Our method learns the same model as an exact training algorithm would, but call the inference engine only in 10% . 24% of the inference problems encountered during training. We observe similar gains on a multi-label classification task and with a Structured Perceptron model for the entity-relation task.

Bib Entry

@inproceedings{chang2015illinoissl,
  author = {Chang, Kai-Wei and Upadhyay, Shyam and Chang, Ming-Wei and Srikumar, Vivek and Roth, Dan},
  title = {IllinoisSL: A JAVA Library for Structured Prediction},
  booktitle = {Arxiv},
  year = {2015}
}

Related Publications

IllinoisSL: A JAVA Library for Structured Prediction

Kai-Wei Chang, Shyam Upadhyay, Ming-Wei Chang, Vivek Srikumar, and Dan Roth, in Arxiv, 2015.
Full Text Abstract BibTeX Details

Training a structured prediction model involves performing several loss-augmented inference steps. Over the lifetime of the training, many of these inference problems, although different, share the same solution. We propose AI-DCD, an Amortized Inference framework for Dual Coordinate Descent method, an approximate learning algorithm, that accelerates the training process by exploiting this redundancy of solutions, without compromising the performance of the model. We show the efficacy of our method by training a structured SVM using dual coordinate descent for an entity-relation extraction task. Our method learns the same model as an exact training algorithm would, but call the inference engine only in 10% . 24% of the inference problems encountered during training. We observe similar gains on a multi-label classification task and with a Structured Perceptron model for the entity-relation task.

@inproceedings{chang2015illinoissl,
  author = {Chang, Kai-Wei and Upadhyay, Shyam and Chang, Ming-Wei and Srikumar, Vivek and Roth, Dan},
  title = {IllinoisSL: A JAVA Library for Structured Prediction},
  booktitle = {Arxiv},
  year = {2015}
}

Details

Multi-core Structural SVM Training

Kai-Wei Chang, Vivek Srikumar, and Dan Roth, in ECML, 2013.
Full Text Poster Abstract BibTeX Details

Many problems in natural language processing and computer vision can be framed as structured prediction problems. Structural support vector machines (SVM) is a popular approach for training structured predictors, where learning is framed as an optimization problem. Most structural SVM solvers alternate between a model update phase and an inference phase (which predicts structures for all training examples). As structures become more complex, inference becomes a bottleneck and thus slows down learning considerably. In this paper, we propose a new learning algorithm for structural SVMs called DEMI-DCD that extends the dual coordinate descent approach by decoupling the model update and inference phases into different threads. We take advantage of multi-core hardware to parallelize learning with minimal synchronization between the model update and the inference phases. We prove that our algorithm not only converges but also fully utilizes all available processors to speed up learning, and validate our approach on two real-world NLP problems: part-of-speech tagging and relation extraction. In both cases, we show that our algorithm utilizes all available processors to speed up learning and achieves competitive performance. For example, it achieves a relative duality gap of 1% on a POS tagging problem in 192 seconds using 16 threads, while a standard implementation of a multi-threaded dual coordinate descent algorithm with the same number of threads requires more than 600 seconds to reach a solution of the same quality.

@inproceedings{chang2013multicore,
  author = {Chang, Kai-Wei and Srikumar, Vivek and Roth, Dan},
  title = {Multi-core Structural SVM Training},
  booktitle = {ECML},
  year = {2013}
}

Details

Tractable Semi-Supervised Learning of Complex Structured Prediction Models

Kai-wei Chang, S. Sundararajan, and S. Sathiya Keerthi, in ECML, 2013.
Full Text Slides Poster Abstract BibTeX Details

Semi-supervised learning has been widely studied in the literature. However, most previous works assume that the output structure is simple enough to allow the direct use of tractable inference/learning algorithms (e.g., binary label or linear chain). Therefore, these methods cannot be applied to problems with complex structure. In this paper, we propose an approximate semi-supervised learning method that uses piecewise training for estimating the model weights and a dual decomposition approach for solving the inference problem of finding the labels of unlabeled data subject to domain specific constraints. This allows us to extend semi-supervised learning to general structured prediction problems. As an example, we apply this approach to the problem of multi-label classification (a fully connected pairwise Markov random field). Experimental results on benchmark data show that, in spite of using approximations, the approach is effective and yields good improvements in generalization performance over the plain supervised method. In addition, we demonstrate that our inference engine can be applied to other semi-supervised learning frameworks, and extends them to solve problems with complex structure.

@inproceedings{ChangSuKe13,
  author = {Chang, Kai-wei and Sundararajan, S. and Keerthi, S. Sathiya},
  title = {Tractable Semi-Supervised Learning of Complex Structured Prediction Models},
  booktitle = {ECML},
  year = {2013}
}

Details