Multicore Structural SVM Training
KaiWei Chang, Vivek Srikumar, and Dan Roth, in ECML, 2013.
PosterDownload the full text
Abstract
Many problems in natural language processing and computer vision can be framed as structured prediction problems. Structural support vector machines (SVM) is a popular approach for training structured predictors, where learning is framed as an optimization problem. Most structural SVM solvers alternate between a model update phase and an inference phase (which predicts structures for all training examples). As structures become more complex, inference becomes a bottleneck and thus slows down learning considerably. In this paper, we propose a new learning algorithm for structural SVMs called DEMIDCD that extends the dual coordinate descent approach by decoupling the model update and inference phases into different threads. We take advantage of multicore hardware to parallelize learning with minimal synchronization between the model update and the inference phases. We prove that our algorithm not only converges but also fully utilizes all available processors to speed up learning, and validate our approach on two realworld NLP problems: partofspeech tagging and relation extraction. In both cases, we show that our algorithm utilizes all available processors to speed up learning and achieves competitive performance. For example, it achieves a relative duality gap of 1% on a POS tagging problem in 192 seconds using 16 threads, while a standard implementation of a multithreaded dual coordinate descent algorithm with the same number of threads requires more than 600 seconds to reach a solution of the same quality.
Bib Entry
@inproceedings{chang2013multicore, author = {Chang, KaiWei and Srikumar, Vivek and Roth, Dan}, title = {Multicore Structural SVM Training}, booktitle = {ECML}, year = {2013} }
Related Publications

IllinoisSL: A JAVA Library for Structured Prediction
KaiWei Chang, Shyam Upadhyay, MingWei Chang, Vivek Srikumar, and Dan Roth, in Arxiv, 2015.
Full Text Abstract BibTeX DetailsTraining a structured prediction model involves performing several lossaugmented inference steps. Over the lifetime of the training, many of these inference problems, although different, share the same solution. We propose AIDCD, an Amortized Inference framework for Dual Coordinate Descent method, an approximate learning algorithm, that accelerates the training process by exploiting this redundancy of solutions, without compromising the performance of the model. We show the efficacy of our method by training a structured SVM using dual coordinate descent for an entityrelation extraction task. Our method learns the same model as an exact training algorithm would, but call the inference engine only in 10% . 24% of the inference problems encountered during training. We observe similar gains on a multilabel classification task and with a Structured Perceptron model for the entityrelation task.
@inproceedings{chang2015illinoissl, author = {Chang, KaiWei and Upadhyay, Shyam and Chang, MingWei and Srikumar, Vivek and Roth, Dan}, title = {IllinoisSL: A JAVA Library for Structured Prediction}, booktitle = {Arxiv}, year = {2015} }

Multicore Structural SVM Training
KaiWei Chang, Vivek Srikumar, and Dan Roth, in ECML, 2013.
Full Text Poster Abstract BibTeX DetailsMany problems in natural language processing and computer vision can be framed as structured prediction problems. Structural support vector machines (SVM) is a popular approach for training structured predictors, where learning is framed as an optimization problem. Most structural SVM solvers alternate between a model update phase and an inference phase (which predicts structures for all training examples). As structures become more complex, inference becomes a bottleneck and thus slows down learning considerably. In this paper, we propose a new learning algorithm for structural SVMs called DEMIDCD that extends the dual coordinate descent approach by decoupling the model update and inference phases into different threads. We take advantage of multicore hardware to parallelize learning with minimal synchronization between the model update and the inference phases. We prove that our algorithm not only converges but also fully utilizes all available processors to speed up learning, and validate our approach on two realworld NLP problems: partofspeech tagging and relation extraction. In both cases, we show that our algorithm utilizes all available processors to speed up learning and achieves competitive performance. For example, it achieves a relative duality gap of 1% on a POS tagging problem in 192 seconds using 16 threads, while a standard implementation of a multithreaded dual coordinate descent algorithm with the same number of threads requires more than 600 seconds to reach a solution of the same quality.
@inproceedings{chang2013multicore, author = {Chang, KaiWei and Srikumar, Vivek and Roth, Dan}, title = {Multicore Structural SVM Training}, booktitle = {ECML}, year = {2013} }

Tractable SemiSupervised Learning of Complex Structured Prediction Models
Kaiwei Chang, S. Sundararajan, and S. Sathiya Keerthi, in ECML, 2013.
Full Text Slides Poster Abstract BibTeX DetailsSemisupervised learning has been widely studied in the literature. However, most previous works assume that the output structure is simple enough to allow the direct use of tractable inference/learning algorithms (e.g., binary label or linear chain). Therefore, these methods cannot be applied to problems with complex structure. In this paper, we propose an approximate semisupervised learning method that uses piecewise training for estimating the model weights and a dual decomposition approach for solving the inference problem of finding the labels of unlabeled data subject to domain specific constraints. This allows us to extend semisupervised learning to general structured prediction problems. As an example, we apply this approach to the problem of multilabel classification (a fully connected pairwise Markov random field). Experimental results on benchmark data show that, in spite of using approximations, the approach is effective and yields good improvements in generalization performance over the plain supervised method. In addition, we demonstrate that our inference engine can be applied to other semisupervised learning frameworks, and extends them to solve problems with complex structure.
@inproceedings{ChangSuKe13, author = {Chang, Kaiwei and Sundararajan, S. and Keerthi, S. Sathiya}, title = {Tractable SemiSupervised Learning of Complex Structured Prediction Models}, booktitle = {ECML}, year = {2013} }