An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes
Hung-Yi Lo, Kai-Wei Chang, Shang-Tse Chen, Tsung-Hsien Chiang, ChunSung Ferng, Cho-Jui Hsieh, Yi-Kuang Ko, Tsung-Ting Kuo, Hung-Che Lai, Ken-Yi Lin, Chia-Hsuan Wang, Hsiang-Fu Yu, Chih-Jen Lin, Hsuan-Tien Lin, and Shou-de Lin, in KDD Cup, 2009.
Download the full text
Abstract
This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective naive Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using crossvalidation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009.
Bib Entry
@inproceedings{LCCCFHKKLLWYLLL09, author = {Lo, Hung-Yi and Chang, Kai-Wei and Chen, Shang-Tse and Chiang, Tsung-Hsien and Ferng, ChunSung and Hsieh, Cho-Jui and Ko, Yi-Kuang and Kuo, Tsung-Ting and Lai, Hung-Che and Lin, Ken-Yi and Wang, Chia-Hsuan and Yu, Hsiang-Fu and Lin, Chih-Jen and Lin, Hsuan-Tien and Lin, Shou-de}, title = {An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes}, booktitle = {KDD Cup}, year = {2009} }