## Preprint

• #### GENEVA: Pushing the Limit of Generalizability for Event Argument Extraction with 100+ Event Types

Tanmay Parekh, I.-Hung Hsu, Kuan-Hao Huang, Kai-Wei Chang, and Nanyun Peng, arXiv preprint arXiv:2205.12505, 2022.
Full Text BibTeX Details
@article{parekh2022geneva,
title = {GENEVA: Pushing the Limit of Generalizability for Event Argument Extraction with 100+ Event Types},
author = {Parekh, Tanmay and Hsu, I-Hung and Huang, Kuan-Hao and Chang, Kai-Wei and Peng, Nanyun},
journal = {arXiv preprint arXiv:2205.12505},
year = {2022}
}

Details
• #### AVATAR: A Parallel Corpus for Java-Python Program Translation

Wasi Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang, in Arxiv, 2021.
Full Text Code Abstract BibTeX Details
Program translation refers to migrating source code from one programming language to another. It has a tremendous practical value in software development as porting software across different languages is time-consuming and costly. Automating program translation is of paramount importance in software migration, and recently researchers explored unsupervised approaches due to the unavailability of parallel corpora. However, the availability of pre-trained language models for programming languages enable supervised fine-tuning with a small amount of labeled examples. In this work, we present a corpus of 8,475 programming problems and their solutions written in two popular languages, Java and Python. We collect the dataset from competitive programming sites, online platforms, and open source repositories. We present several baselines, including models trained from scratch or pre-trained on large-scale source code collection and fine-tuned on our proposed dataset. Experiment results show that while the models perform relatively well in terms of the lexical match, they lack in generating code that is accurate in terms of syntax and data-flow match.
@inproceedings{ahmad2021avatar,
title = {AVATAR: A Parallel Corpus for Java-Python Program Translation},
author = {Ahmad, Wasi and Tushar, Md Golam Rahman and Chakraborty, Saikat and Chang, Kai-Wei},
booktitle = {Arxiv},
year = {2021}
}

Details
• #### VisualBERT: A Simple and Performant Baseline for Vision and Language

Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in Arxiv, 2019.
Full Text Code Abstract BibTeX Details
We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.
@inproceedings{li2019visualbert,
author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
title = {VisualBERT: A Simple and Performant Baseline for Vision and Language},
booktitle = {Arxiv},
year = {2019}
}

Details

## 2022

• #### ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation

Fan Yin, Yao Li, Cho-Jui Hsieh, and Kai-Wei Chang, in EMNLP, 2022.
Full Text Abstract BibTeX Details
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks and has drawn increasing attention from the Natural Language Processing (NLP) community. Despite the surge of new AED methods, our studies show that existing methods heavily rely on a shortcut to achieve good performance. In other words, current search-based adversarial attacks in NLP stop once model predictions change, and thus most adversarial examples generated by those attacks are located near model decision boundaries. To surpass this shortcut and fairly evaluate AED methods, we propose to test AED methods with Far Boundary (FB) adversarial examples. Existing methods show worse than random guess performance under this scenario. To overcome this limitation, we propose a new technique, ADDMU, adversary detection with data and model uncertainty, which combines two types of uncertainty estimation for both regular and FB adversarial example detection. Our new method outperforms previous methods by 3.6 and 6.0 AUC points under each scenario. Finally, our analysis shows that the two types of uncertainty provided by ADDMU can be leveraged to characterize adversarial examples and identify the ones that contribute most to model’s robustness in adversarial training.
@inproceedings{yin2022addmu,
title = {ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation},
author = {Yin, Fan and Li, Yao and Hsieh, Cho-Jui and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2022}
}

Details
• #### Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense

Zhecan Wang, Haoxuan You, Yicheng He, Wenhao Li, Kai-Wei Chang, and Shih-Fu Chang, in EMNLP, 2022.
BibTeX Details
@inproceedings{you2022fine,
title = {Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense},
author = {Wang, Zhecan and You, Haoxuan and He, Yicheng and Li, Wenhao and Chang, Kai-Wei and Chang, Shih-Fu},
booktitle = {EMNLP},
year = {2022}
}

Details
• #### GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models

Da Yin, Hritik Bansal, Masoud Monajatipoor, Liunian Harold Li, and Kai-Wei Chang, in EMNLP, 2022.
Full Text Code Abstract BibTeX Details
Recent work has shown that Pre-trained Language Models (PLMs) have the ability to store the relational knowledge from pre-training data in their model parameters. However, it is not clear up to what extent do PLMs store geo-diverse commonsense knowledge, the knowledge associated with a culture and only shared locally. For instance, the color of bridal dress is white in American weddings whereas it is red in Chinese weddings. Here, we wish to probe if PLMs can predict red and white as the color of the bridal dress when queried for American and Chinese weddings, respectively. To this end, we introduce a framework for geo-diverse commonsense probing on multilingual PLMs (mPLMs) and introduce a corresponding benchmark Geo-diverse Commonsense Multilingual Language Model Analysis (GeoMLAMA) dataset. GeoMLAMA contains 3125 prompts in English, Chinese, Hindi, Persian, and Swahili, with a wide coverage of concepts shared by people from American, Chinese, Indian, Iranian and Kenyan cultures. We benchmark 11 standard mPLMs which include variants of mBERT, XLM, mT5, and XGLM on GeoMLAMA. Interestingly, we find that 1) larger mPLM variants do not necessarily store geo-diverse concepts better than its smaller variant; 2) mPLMs are not intrinsically biased towards knowledge from the Western countries (the United States); 3) the native language of a country may not be the best language to probe its knowledge and 4) a language may better probe knowledge about a non-native country than its native country.
@inproceedings{yin2022geomlama,
title = {GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models},
author = {Yin, Da and Bansal, Hritik and Monajatipoor, Masoud and Li, Liunian Harold and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2022}
}

Details
• #### Empowering Language Models with Knowledge Graph Reasoning for Open-Domain Question Answering

Ziniu Hu, Yichong Xu, Wenhao Yu, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Kai-Wei Chang, and Yizhou Sun, in EMNLP, 2022.
BibTeX Details
@inproceedings{hu2022empowering,
title = {Empowering Language Models with Knowledge Graph Reasoning for Open-Domain Question Answering},
author = {Hu, Ziniu and Xu, Yichong and Yu, Wenhao and Wang, Shuohang and Yang, Ziyi and Zhu, Chenguang and Chang, Kai-Wei and Sun, Yizhou},
booktitle = {EMNLP},
year = {2022}
}

Details
• #### How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?

Hritik Bansal, Da Yin, Masoud Monajatipoor, and Kai-Wei Chang, in EMNLP (Short), 2022.
Full Text Code Abstract BibTeX Details
Text-to-image generative models have achieved unprecedented success in generating high-quality images based on natural language descriptions. However, it is shown that these models tend to favor specific social groups when prompted with neutral text descriptions (e.g., ’a photo of a lawyer’). Following Zhao et al. (2021), we study the effect on the diversity of the generated images when adding ethical intervention that supports equitable judgment (e.g., ’if all individuals can be a lawyer irrespective of their gender’) in the input prompts. To this end, we introduce an Ethical NaTural Language Interventions in Text-to-Image GENeration (ENTIGEN) benchmark dataset to evaluate the change in image generations conditional on ethical interventions across three social axes – gender, skin color, and culture. Through ENTIGEN framework, we find that the generations from minDALL.E, DALL.E-mini and Stable Diffusion cover diverse social groups while preserving the image quality. Preliminary studies indicate that a large change in the model predictions is triggered by certain phrases such as ’irrespective of gender’ in the context of gender bias in the ethical interventions. We release code and annotated data at https://github.com/Hritikbansal/entigen_emnlp.
@inproceedings{bansal2022how,
title = {How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?},
author = {Bansal, Hritik and Yin, Da and Monajatipoor, Masoud and Chang, Kai-Wei},
booktitle = {EMNLP (Short)},
year = {2022}
}

Details
• #### Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding

Haoxuan You, Rui Sun, Zhecan Wang, Kai-Wei Chang, and Shih-Fu Chang, in EMNLP-Finding, 2022.
BibTeX Details
@inproceedings{you2022find,
title = {Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding},
author = {You, Haoxuan and Sun, Rui and Wang, Zhecan and Chang, Kai-Wei and Chang, Shih-Fu},
booktitle = {EMNLP-Finding},
year = {2022}
}

Details
• #### Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers

Jieyu Zhao, Xuezhi Wang, Yao Qin, Jilin Chen, and Kai-Wei Chang, in EMNLP-Finding (short), 2022.
Full Text BibTeX Details
@inproceedings{zhao2022investigating,
title = {	Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers},
author = {Zhao, Jieyu and Wang, Xuezhi and Qin, Yao and Chen, Jilin and Chang, Kai-Wei},
booktitle = {EMNLP-Finding (short)},
year = {2022}
}

Details
• #### Conditional Supervised Contrastive Learning for Fair Text Classification

Jianfeng Chi, William Shand, Yaodong Yu, Kai-Wei Chang, Han Zhao, and Yuan Tian, in EMNLP-Finding, 2022.
BibTeX Details
@inproceedings{chi2022conditional,
title = {Conditional Supervised Contrastive Learning for Fair Text Classification},
author = {Chi, Jianfeng and Shand, William and Yu, Yaodong and Chang, Kai-Wei and Zhao, Han and Tian, Yuan},
booktitle = {EMNLP-Finding},
year = {2022}
}

Details
• #### Representation Learning for Resource-Constrained Keyphrase Generation

Di Wu, Wasi Uddin Ahmad, Sunipa Dev, and Kai-Wei Chang, in EMNLP-Finding, 2022.
Full Text Code Abstract BibTeX Details
State-of-the-art keyphrase generation methods generally depend on large annotated datasets, limiting their performance in domains with limited annotated data. To overcome this challenge, we design a data-oriented approach that first identifies salient information using unsupervised corpus-level statistics, and then learns a task-specific intermediate representation based on a pre-trained language model. We introduce salient span recovery and salient span prediction as denoising training objectives that condense the intra-article and inter-article knowledge essential for keyphrase generation. Through experiments on multiple keyphrase generation benchmarks, we show the effectiveness of the proposed approach for facilitating low-resource and zero-shot keyphrase generation. We further observe that the method especially benefits the generation of absent keyphrases, approaching the performance of models trained with large training sets.
@inproceedings{wu2022representation,
title = {Representation Learning for Resource-Constrained Keyphrase Generation},
author = {Wu, Di and Ahmad, Wasi Uddin and Dev, Sunipa and Chang, Kai-Wei},
booktitle = {EMNLP-Finding},
year = {2022}
}

Details
• #### Unsupervised Syntactically Controlled Paraphrase Generation with Abstract Meaning Representations

Kuan-Hao Huang, Varun Iyer, Anoop Kumar, Sriram Venkatapathy, Kai-Wei Chang, and Aram Galstyan, in EMNLP-Finding (short), 2022.
BibTeX Details
@inproceedings{huang2022unsupervised,
title = {Unsupervised Syntactically Controlled Paraphrase Generation with Abstract Meaning Representations},
author = {Huang, Kuan-Hao and Iyer, Varun and Kumar, Anoop and Venkatapathy, Sriram and Chang, Kai-Wei and Galstyan, Aram},
booktitle = {EMNLP-Finding (short)},
year = {2022}
}

Details
• #### Controllable Text Generation with Neurally-Decomposed Oracle

Tao Meng, Sidi Lu, Nanyun Peng, and Kai-Wei Chang, in NeurIPS, 2022.
Full Text Abstract BibTeX Details
We propose a general and efficient framework to control auto-regressive generation models with NeurAlly-Decomposed Oracle (NADO). Given a pre-trained base language model and a sequence-level boolean oracle function, we propose to decompose the oracle function into token-level guidance to steer the base model in text generation. Specifically, the token-level guidance is approximated by a neural model trained with examples sampled from the base model, demanding no additional auxiliary labeled data. We present the closed-form optimal solution to incorporate the token-level guidance into the base model for controllable generation. We further provide a theoretical analysis of how the approximation quality of NADO affects the controllable generation results. Experiments conducted on two applications: (1) text generation with lexical constraints and (2) machine translation with formality control demonstrate that our framework efficiently guides the base model towards the given oracle while maintaining high generation quality.
@inproceedings{meng2022controllable,
title = {Controllable Text Generation with Neurally-Decomposed Oracle},
author = {Meng, Tao and Lu, Sidi and Peng, Nanyun and Chang, Kai-Wei},
booktitle = {NeurIPS},
year = {2022}
}

Details
• #### Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, and Ashwin Kalyan, in NeurIPS, 2022.
BibTeX Details
@inproceedings{lu2022learn,
title = {Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering},
author = {Lu, Pan and Mishra, Swaroop and Xia, Tony and Qiu, Liang and Chang, Kai-Wei and Zhu, Song-Chun and Tafjord, Oyvind and Clark, Peter and Kalyan, Ashwin},
booktitle = {NeurIPS},
year = {2022}
}

Details
• #### On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Arjun Subramonian, Kai-Wei Chang, and Yizhou Sun, in NeurIPS, 2022.
BibTeX Details
@inproceedings{subramonian2022on,
title = {On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs},
author = {Subramonian, Arjun and Chang, Kai-Wei and Sun, Yizhou},
booktitle = {NeurIPS},
year = {2022}
}

Details
• #### Semantic Probabilistic Layers for Neuro-Symbolic Learning

Kareem Ahmed, Stefano Teso, Kai-Wei Chang, Guy Van den Broeck, and Antonio Vergari, in NeurIPS, 2022.
BibTeX Details
@inproceedings{ahmed2022semantic,
title = {Semantic Probabilistic Layers for Neuro-Symbolic Learning},
author = {Ahmed, Kareem and Teso, Stefano and Chang, Kai-Wei and den Broeck, Guy Van and Vergari, Antonio},
booktitle = {NeurIPS},
year = {2022}
}

Details
• #### Neuro-Symbolic Entropy Regularization

Kareem Ahmed, Eric Wang, Kai-Wei Chang, and Guy Van den Broeck, in UAI, 2022.
Full Text Abstract BibTeX Details
In structured prediction, the goal is to jointly predict many output variables that together encode a structured object – a path in a graph, an entity-relation triple, or an ordering of objects. Such a large output space makes learning hard and requires vast amounts of labeled data. Different approaches leverage alternate sources of supervision. One approach – entropy regularization – posits that decision boundaries should lie in low-probability regions. It extracts supervision from unlabeled examples, but remains agnostic to the structure of the output space. Conversely, neuro-symbolic approaches exploit the knowledge that not every prediction corresponds to a valid structure in the output space. Yet, they does not further restrict the learned output distribution. This paper introduces a framework that unifies both approaches. We propose a loss, neuro-symbolic entropy regularization, that encourages the model to confidently predict a valid object. It is obtained by restricting entropy regularization to the distribution over only valid structures. This loss is efficiently computed when the output constraint is expressed as a tractable logic circuit. Moreover, it seamlessly integrates with other neuro-symbolic losses that eliminate invalid predictions. We demonstrate the efficacy of our approach on a series of semi-supervised and fully-supervised structured-prediction experiments, where we find that it leads to models whose predictions are more accurate and more likely to be valid.
@inproceedings{ahmadneuro2022,
title = {Neuro-Symbolic Entropy Regularization},
author = {Ahmed, Kareem and Wang, Eric and Chang, Kai-Wei and den Broeck, Guy Van},
booktitle = {UAI},
year = {2022}
}

Details
• #### DEGREE: A Data-Efficient Generative Event Extraction Model

I.-Hung Hsu, Kuan-Hao Huang, Elizabeth Boschee, Scott Miller, Prem Natarajan, Kai-Wei Chang, and Nanyun Peng, in NAACL, 2022.
Full Text Abstract BibTeX Details
Event extraction (EE), the task that identifies event triggers and their arguments in text, is usually formulated as a classification or structured prediction problem. Such models usually reduce labels to numeric identifiers, making them unable to take advantage of label semantics (e.g. an event type named Arrest is related to words like arrest, detain, or apprehend). This prevents the generalization to new event types. In this work, we formulate EE as a natural language generation task and propose GenEE, a model that not only captures complex dependencies within an event but also generalizes well to unseen or rare event types. Given a passage and an event type, GenEE is trained to generate a natural sentence following a predefined template for that event type. The generated output is then decoded into trigger and argument predictions. The autoregressive generation process naturally models the dependencies among the predictions – each new word predicted depends on those already generated in the output sentence. Using carefully designed input prompts during generation, GenEE is able to capture label semantics, which enables the generalization to new event types. Empirical results show that our model achieves strong performance on event extraction tasks under all zero-shot, few-shot, and high-resource scenarios. Especially, in the high-resource setting, GenEE outperforms the state-of-the-art model on argument extraction and gets competitive results with the current best on end-to-end EE tasks.
@inproceedings{hsu2021degree,
title = {DEGREE: A Data-Efficient Generative Event Extraction Model},
author = {Hsu, I-Hung and Huang, Kuan-Hao and Boschee, Elizabeth and Miller, Scott and Natarajan, Prem and Chang, Kai-Wei and Peng, Nanyun},
booktitle = {NAACL},
year = {2022}
}

Details
• #### Socially Aware Bias Measurements for Hindi Language Representations

Vijit Malik, Sunipa Dev, Akihiro Nishi, Nanyun Peng, and Kai-Wei Chang, in NAACL (short), 2022.
Full Text BibTeX Details
@inproceedings{malik2022socially,
title = {Socially Aware Bias Measurements for Hindi Language Representations},
author = {Malik, Vijit and Dev, Sunipa and Nishi, Akihiro and Peng, Nanyun and Chang, Kai-Wei},
booktitle = {NAACL (short)},
year = {2022}
}

Details
• #### Socially Aware Bias Measurements for Hindi Language Representations

Vijit Malik, Sunipa Dev, Akihiro Nishi, Nanyun Peng, and Kai-Wei Chang, in NAACL (short), 2022.
Full Text BibTeX Details
@inproceedings{malik2022sociallz,
title = {Socially Aware Bias Measurements for Hindi Language Representations},
author = {Malik, Vijit and Dev, Sunipa and Nishi, Akihiro and Peng, Nanyun and Chang, Kai-Wei},
booktitle = {NAACL (short)},
year = {2022}
}

Details
• #### Measuring Fairness of Text Classifiers via Prediction Sensitivity

Satyapriya Krishna, Rahul Gupta, Apurv Verma, Jwala Dhamala, Yada Pruksachatkun, and Kai-Wei Chang, in ACL, 2022.
Full Text Abstract BibTeX Details
With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is lack of consensus on which metrics most accurately reflect the fairness of a system. In this work, we propose a new formulation : ACCUMULATED PREDICTION SENSITIVITY, which measures fairness in machine learning models based on the model’s prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness. It also correlates well with humans’ perception of fairness. We conduct experiments on two text classification datasets : JIGSAW TOXICITY, and BIAS IN BIOS, and evaluate the correlations between metrics and manual annotations on whether the model produced a fair outcome. We observe that the proposed fairness metric based on prediction sensitivity is statistically significantly more correlated with human annotation than the existing counterfactual fairness metric.
@inproceedings{krishna2022measuring,
title = {Measuring Fairness of Text Classifiers via Prediction Sensitivity},
author = {Krishna, Satyapriya and Gupta, Rahul and Verma, Apurv and Dhamala, Jwala and Pruksachatkun, Yada and Chang, Kai-Wei},
booktitle = {ACL},
year = {2022}
}

Details
• #### Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction

Kuan-Hao Huang, I.-Hung Hsu, Prem Natarajan, Kai-Wei Chang, and Nanyun Peng, in ACL, 2022.
Full Text Code Abstract BibTeX Details
We present a study on leveraging multilingual pre-trained generative language models for zero-shot cross-lingual event argument extraction (EAE). By formulating EAE as a language generation task, our method effectively encodes event structures and captures the dependencies between arguments. We design language-agnostic templates to represent the event argument structures, which are compatible with any language, hence facilitating the cross-lingual transfer. Our proposed model finetunes multilingual pre-trained generative language models to generate sentences that fill in the language-agnostic template with arguments extracted from the input passage. The model is trained on source languages and is then directly applied to target languages for event argument extraction. Experiments demonstrate that the proposed model outperforms the current state-of-the-art models on zero-shot cross-lingual EAE. Comprehensive studies and error analyses are presented to better understand the advantages and the current limitations of using generative language models for zero-shot cross-lingual transfer EAE.
@inproceedings{huang2022multilingual,
title = {Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction},
author = {Huang, Kuan-Hao and Hsu, I-Hung and Natarajan, Prem and Chang, Kai-Wei and Peng, Nanyun},
booktitle = {ACL},
year = {2022}
}

Details
• #### PYLON: A PyTorch Framework for Learning with Constraints

Kareem Ahmed, Tao Li, Thy Ton, Quan Guo, Kai-Wei Chang, Parisa Kordjamshidi, Vivek Srikumar, Guy Van den Broeck, and Sameer Singh, in AAAI (demo), 2022.
Full Text BibTeX Details
@inproceedings{ahmad2022pylon,
title = {PYLON: A PyTorch Framework for Learning with Constraints},
author = {Ahmed, Kareem and Li, Tao and Ton, Thy and Guo, Quan and Chang, Kai-Wei and Kordjamshidi, Parisa and Srikumar, Vivek and den Broeck, Guy Van and Singh, Sameer},
booktitle = {AAAI (demo)},
year = {2022}
}

Details
• #### On the Sensitivity and Stability of Model Interpretations

Fan Yin, Zhouxing Shi, Cho-Jui Hsieh, and Kai-Wei Chang, in ACL, 2022.
Full Text Abstract BibTeX Details
Recent years have witnessed the emergence of a variety of post-hoc interpretations that aim to uncover how natural language processing (NLP) models make predictions. Despite the surge of new interpretation methods, it remains an open problem how to define and quantitatively measure the faithfulness of interpretations, i.e., to what extent interpretations reflect the reasoning process by a model. We propose two new criteria, sensitivity and stability, that provide complementary notions of faithfulness to the existed removal-based criteria. Our results show that the conclusion for how faithful interpretations are could vary substantially based on different notions. Motivated by the desiderata of sensitivity and stability, we introduce a new class of interpretation methods that adopt techniques from adversarial robustness. Empirical results show that our proposed methods are effective under the new criteria and overcome limitations of gradient-based methods on removal-based criteria. Besides text classification, we also apply interpretation methods and metrics to dependency parsing. Our results shed light on understanding the diverse set of interpretations.
@inproceedings{yin2022on,
title = {On the Sensitivity and Stability of Model Interpretations},
author = {Yin, Fan and Shi, Zhouxing and Hsieh, Cho-Jui and Chang, Kai-Wei},
booktitle = {ACL},
year = {2022}
}

Details
• #### On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations

Yang Trista Cao, Yada Pruksachatkun, Kai-Wei Chang, Rahul Gupta, Varun Kumar, Jwala Dhamala, and Aram Galstyan, in ACL (short), 2022.
Full Text Abstract BibTeX Details
Multiple metrics have been introduced to measure fairness in various natural language processing tasks. These metrics can be roughly categorized into two categories: 1) \emphextrinsic metrics for evaluating fairness in downstream applications and 2) \emphintrinsic metrics for estimating fairness in upstream contextualized language representation models. In this paper, we conduct an extensive correlation study between intrinsic and extrinsic metrics across bias notions using 19 contextualized language models. We find that intrinsic and extrinsic metrics do not necessarily correlate in their original setting, even when correcting for metric misalignments, noise in evaluation datasets, and confounding factors such as experiment configuration for extrinsic metrics.
@inproceedings{trista2022evaluation,
title = {On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations},
author = {Cao, Yang Trista and Pruksachatkun, Yada and Chang, Kai-Wei and Gupta, Rahul and Kumar, Varun and Dhamala, Jwala and Galstyan, Aram},
booktitle = {ACL (short)},
year = {2022}
}

Details
• #### Towards Adversarially Robust Text Classifiers by Learning to Reweight Clean Examples

Jianhan Xu, Cenyuan Zhang, Xiaoqing Zheng, Linyang Li, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang, in ACL Finding, 2022.
Full Text Abstract BibTeX Details
Most of the existing defense methods improve the adversarial robustness by making the models adapt to the training set augmented with some adversarial examples. However, the augmented adversarial examples may not be natural, which might distort the training distribution, resulting in inferior performance both in clean accuracy and adversarial robustness. In this study, we explore the feasibility of introducing a reweighting mechanism to calibrate the training distribution to obtain robust models. We propose to train text classifiers by a sample reweighting method in which the example weights are learned to minimize the loss of a validation set mixed with the clean examples and their adversarial ones in an online learning manner. Through extensive experiments, we show that there exists a reweighting mechanism to make the models more robust against adversarial attacks without the need to craft the adversarial examples for the entire training set.
@inproceedings{xu2022towards,
title = {Towards Adversarially Robust Text Classifiers by Learning to Reweight Clean Examples},
author = {Xu, Jianhan and Zhang, Cenyuan and Zheng, Xiaoqing and Li, Linyang and Hsieh, Cho-Jui and Chang, Kai-Wei and Huang, Xuanjing},
booktitle = {ACL Finding},
year = {2022}
}

Details
• #### Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

Umang Gupta, Jwala Dhamala, Varun Kumar, Apurv Verma, Yada Pruksachatkun, Satyapriya Krishna, Rahul Gupta, Kai-Wei Chang, Greg Ver Steeg, and Aram Galstyan, in ACL Finding, 2022.
Full Text Abstract BibTeX Details
Language models excel at generating coherent text, and model compression techniques such as knowledge distillation have enabled their use in resource-constrained settings. However, these models can be biased in multiple ways, including the unfounded association of male and female genders with gender-neutral professions. Therefore, knowledge distillation without any fairness constraints may preserve or exaggerate the teacher model’s biases onto the distilled model. To this end, we present a novel approach to mitigate gender disparity in text generation by learning a fair model during knowledge distillation. We propose two modifications to the base knowledge distillation based on counterfactual role reversal – modifying teacher probabilities and augmenting the training set. We evaluate gender polarity across professions in open-ended text generated from the resulting distilled and finetuned GPT-2models and demonstrate a substantial reduction in gender disparity with only a minor compromise in utility. Finally, we observe that language models that reduce gender polarity in language generation do not improve embedding fairness or downstream classification fairness.
@inproceedings{gupta2022equitable,
title = {Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal},
author = {Gupta, Umang and Dhamala, Jwala and Kumar, Varun and Verma, Apurv and Pruksachatkun, Yada and Krishna, Satyapriya and Gupta, Rahul and Chang, Kai-Wei and Steeg, Greg Ver and Galstyan, Aram},
booktitle = {ACL Finding},
year = {2022}
}

Details
• #### Improving the Adversarial Robustness of NLP Models by Information Bottleneck

Cenyuan Zhang, Xiang Zhou, Yixin Wan, Xiaoqing Zheng, Kai-Wei Chang, and Cho-Jui Hsieh, in ACL-Finding, 2022.
Full Text Abstract BibTeX Details
Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.
@inproceedings{zhang2022improving,
title = {Improving the Adversarial Robustness of NLP Models by Information Bottleneck},
author = {Zhang, Cenyuan and Zhou, Xiang and Wan, Yixin and Zheng, Xiaoqing and Chang, Kai-Wei and Hsieh, Cho-Jui},
booktitle = {ACL-Finding},
year = {2022}
}

Details
• #### Grounded Language-Image Pre-training

Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, and Jianfeng Gao, in CVPR, 2022.
Full Text Code Abstract BibTeX Details Best Paper Finallist
This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich. In our experiments, we pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs. The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks. 1) When directly evaluated on COCO and LVIS (without seeing any images in COCO during pre-training), GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many supervised baselines. 2) After fine-tuned on COCO, GLIP achieves 60.8 AP on val and 61.5 AP on test-dev, surpassing prior SoTA. 3) When transferred to 13 downstream object detection tasks, a 1-shot GLIP rivals with a fully-supervised Dynamic Head.
@inproceedings{li2022grounded,
title = {Grounded Language-Image Pre-training},
author = {Li, Liunian Harold and Zhang, Pengchuan and Zhang, Haotian and Yang, Jianwei and Li, Chunyuan and Zhong, Yiwu and Wang, Lijuan and Yuan, Lu and Zhang, Lei and Hwang, Jenq-Neng and Chang, Kai-Wei and Gao, Jianfeng},
booktitle = {CVPR},
year = {2022}
}

Details
• #### How Much Can CLIP Benefit Vision-and-Language Tasks?

Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, and Kurt Keutz, in ICLR, 2022.
Full Text Code Abstract BibTeX Details
Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world. However, it has been observed that large-scale pretraining usually can result in better generalization performance, e.g., CLIP (Contrastive Language-Image Pre-training), trained on a massive amount of image-caption pairs, has shown a strong zero-shot capability on various vision tasks. To further study the advantage brought by CLIP, we propose to use CLIP as the visual encoder in various V&L models in two typical scenarios: 1) plugging CLIP into task-specific fine-tuning; 2) combining CLIP with V&L pre-training and transferring to downstream tasks. We show that CLIP significantly outperforms widely-used visual encoders trained with in-domain annotated data, such as BottomUp-TopDown. We achieve competitive or better results on diverse V&L tasks, while establishing new state-of-the-art results on Visual Question Answering, Visual Entailment, and V&L Navigation tasks.
@inproceedings{shen2022how,
title = { How Much Can CLIP Benefit Vision-and-Language Tasks? },
author = {Shen, Sheng and Li, Liunian Harold and Tan, Hao and Bansal, Mohit and Rohrbach, Anna and Chang, Kai-Wei and Yao, Zhewei and Keutz, Kurt},
booktitle = {ICLR},
year = {2022}
}

Details
• #### SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, and Shih-Fu Chang, in AAAI, 2022.
Full Text Abstract BibTeX Details
Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made a great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (\bf SGEITL) framework to incorporate visual scene graph in commonsense reasoning. In order to exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in visual scene graph. Moreover, we introduce a method to train and generate domain relevant visual scene graph using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show significant performance boost compared with the state-of-the-art methods, and prove the efficacy of each proposed component.
@inproceedings{wang2022sgeitl,
title = {SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning},
author = {Wang, Zhecan and You, Haoxuan and Li, Liunian Harold and Zareian, Alireza and Park, Suji and Liang, Yiqing and Chang, Kai-Wei and Chang, Shih-Fu},
booktitle = {AAAI},
year = {2022}
}

Details

## 2021

• #### Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

Da Yin, Liunian Harold Li, Ziniu Hu, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2021.
Full Text Code Abstract BibTeX Details
Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models’ ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard multimodal commonsense benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition.
@inproceedings{yin2021broaden,
title = {	Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning},
author = {Yin, Da and Li, Liunian Harold and Hu, Ziniu and Peng, Nanyun and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2021}
}

Details
• #### On the Transferability of Adversarial Attacks against Neural Text Classifier

Liping Yuan, Xiaoqing Zheng, Yi Zhou, Cho-Jui Hsieh, and Kai-Wei Chang, in EMNLP, 2021.
Full Text Abstract BibTeX Details
Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we present the first study to systematically investigate the transferability of adversarial examples for text classification models and explore how various factors, including network architecture, tokenization scheme, word embedding, and model capacity, affect the transferability of adversarial examples. Based on these studies, we propose a genetic algorithm to find an ensemble of models that can be used to induce adversarial examples to fool almost all existing models. Such adversarial examples reflect the defects of the learning process and the data bias in the training set. Finally, we derive word replacement rules that can be used for model diagnostics from these adversarial examples.
@inproceedings{yuan2021on,
title = {On the Transferability of Adversarial Attacks against Neural Text Classifier},
author = {Yuan, Liping and Zheng, Xiaoqing and Zhou, Yi and Hsieh, Cho-Jui and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2021}
}

Details
• #### Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution

Zongyi Li, Jianhan Xu, Jiehang Zeng, Linyang Li, Xiaoqing Zheng, Qi Zhang, Kai-Wei Chang, and Cho-Jui Hsieh, in EMNLP, 2021.
Full Text Abstract BibTeX Details
Recent studies have shown that deep neural networks are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However, there is a lack of systematic study on comparing different defense approaches under the same attacking setting. In this paper, we seek to fill the gap of systematic studies through comprehensive researches on understanding the behavior of neural text classifiers trained by various defense methods under representative adversarial attacks. In addition, we propose an effective method to further improve the robustness of neural text classifiers against such attacks and achieved the highest accuracy on both clean and adversarial examples on AGNEWS and IMDB datasets by a significant margin.
@inproceedings{li2021searching,
title = {Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution},
author = {Li, Zongyi and Xu, Jianhan and Zeng, Jiehang and Li, Linyang and Zheng, Xiaoqing and Zhang, Qi and Chang, Kai-Wei and Hsieh, Cho-Jui},
booktitle = {EMNLP},
year = {2021}
}

Details
• #### Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

Kuan-Hao Huang, Wasi Ahmad, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2021.
Full Text Code Abstract BibTeX Details
Pre-trained multilingual language encoders, such as multilingual BERT and XLM-R, show great potential for zero-shot cross-lingual transfer. However, these multilingual encoders do not precisely align words and phrases across languages. Especially, learning alignments in the multilingual embedding space usually requires sentence-level or word-level parallel corpora, which are expensive to be obtained for low-resource languages. An alternative is to make the multilingual encoders more robust; when fine-tuning the encoder using downstream task, we train the encoder to tolerate noise in the contextual embedding spaces such that even if the representations of different languages are not aligned well, the model can still achieve good performance on zero-shot cross-lingual transfer. In this work, we propose a learning strategy for training robust models by drawing connections between adversarial examples and the failure cases of zero-shot cross-lingual transfer. We adopt two widely used robust training methods, adversarial training and randomized smoothing, to train the desired robust model. The experimental results demonstrate that robust training improves zero-shot cross-lingual transfer on text classification tasks. The improvement is more significant in the generalized cross-lingual transfer setting, where the pair of input sentences belong to two different languages.
@inproceedings{huang2021improving,
title = {Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training},
author = {Huang, Kuan-Hao and Ahmad, Wasi and Peng, Nanyun and Chang, Kai-Wei},
presentation_id = {https://underline.io/events/192/posters/7783/poster/40656-improving-zero-shot-cross-lingual-transfer-learning-via-robust-training},
booktitle = {EMNLP},
year = {2021}
}

Details
• #### Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

Sunipa Dev, Masoud Monajatipoor, Anaelia Ovalle, Arjun Subramonian, Jeff Phillips, and Kai-Wei Chang, in EMNLP, 2021.
Full Text Slides Poster Abstract BibTeX Details
Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understanding of non-binary genders in society. In this paper, we explain the complexity of gender and language around it, and survey non-binary persons to understand harms associated with the treatment of gender as binary in English language technologies. We also detail how current language representations (e.g., GloVe, BERT) capture and perpetuate these harms and related challenges that need to be acknowledged and addressed for representations to equitably encode gender information.
@inproceedings{dev2021harms,
title = {Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies},
author = {Dev, Sunipa and Monajatipoor, Masoud and Ovalle, Anaelia and Subramonian, Arjun and Phillips, Jeff and Chang, Kai-Wei},
presentation_id = {https://underline.io/events/192/sessions/7788/lecture/37320-harms-of-gender-exclusivity-and-challenges-in-non-binary-representation-in-language-technologies},
blog_url = {https://uclanlp.medium.com/harms-of-gender-exclusivity-and-challenges-in-non-binary-representation-in-language-technologies-5f89891b5aee},
booktitle = {EMNLP},
year = {2021}
}

Details
• #### Relation-Guided Pre-Training for Open-Domain Question Answering

Ziniu Hu, Yizhou Sun, and Kai-Wei Chang, in EMNLP-Finding, 2021.
Full Text Abstract BibTeX Details
Answering complex open-domain questions requires understanding the latent relations between involving entities. However, we found that the existing QA datasets are extremely imbalanced in some types of relations, which hurts the generalization performance over questions with long-tail relations. To remedy this problem, in this paper, we propose a Relation-Guided Pre-Training (RGPT-QA) framework. We first generate a relational QA dataset covering a wide range of relations from both the Wikidata triplets and Wikipedia hyperlinks. We then pre-train a QA model to infer the latent relations from the question, and then conduct extractive QA to get the target answer entity. We demonstrate that by pretraining with propoed RGPT-QA techique, the popular open-domain QA model, Dense Passage Retriever (DPR), achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions. Particularly, we show that RGPT-QA improves significantly on questions with long-tail relations
@inproceedings{hu2021relation,
title = {Relation-Guided Pre-Training for Open-Domain Question Answering},
author = {Hu, Ziniu and Sun, Yizhou and Chang, Kai-Wei},
booktitle = {EMNLP-Finding},
year = {2021}
}

Details
• #### Retrieval Augmented Code Generation and Summarization

Md Rizwan Parvez, Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in EMNLP-Finding, 2021.
Full Text Abstract BibTeX Details
Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers’ code or summary generation behavior, we propose a retrieval augmented framework, \tool, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. \tool has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.
@inproceedings{parvez2021retrieval,
title = {Retrieval Augmented Code Generation and Summarization},
author = {Parvez, Md Rizwan and Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
booktitle = {EMNLP-Finding},
presentation_id = {https://underline.io/events/192/sessions/7923/lecture/38314-retrieval-augmented-code-generation-and-summarization},
year = {2021}
}

Details
• #### BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis

Masoud Monajatipoor, Mozhdeh Rouhsedaghat, Liunian Harold Li, Aichi Chien, C.-C. Jay Kuo, Fabien Scalzo, and Kai-Wei Chang, in ICCV workshop on Computer Vision for Automated Medical Diagnosis, 2021.
Full Text Abstract BibTeX Details
Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them. Prior studies show that pre-trained V&L models can significantly improve the model performance for downstream tasks such as Visual Question Answering (VQA). However, V&L models are less effective when applied in the medical domain (e.g., on X-ray images and clinical notes) due to the domain gap. In this paper, we investigate the challenges of applying pre-trained V&L models in medical applications. In particular, we identify that the visual representation in general V&L models is not suitable for processing medical data. To overcome this limitation, we propose BERTHop, a transformer-based model based on PixelHop++ and VisualBERT, for better capturing the associations between the two modalities. Experiments on the OpenI dataset, a commonly used thoracic disease diagnosis benchmark, show that BERTHop achieves an average Area Under the Curve (AUC) of 98.12% which is 1.62% higher than state-of-the-art (SOTA) while it is trained on a 9 times smaller dataset.
@inproceedings{monajatipoor2021berthop,
title = {BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis},
author = {Monajatipoor, Masoud and Rouhsedaghat, Mozhdeh and Li, Liunian Harold and Chien, Aichi and Kuo, C. -C. Jay and Scalzo, Fabien and Chang, Kai-Wei},
booktitle = {ICCV workshop on Computer Vision for Automated Medical Diagnosis},
year = {2021}
}

Details
• #### An Integer Linear Programming Framework for Mining Constraints from Data

Tao Meng and Kai-Wei Chang, in ICML, 2021.
Full Text Video Code Abstract BibTeX Details
Various structured output prediction problems (e.g., sequential tagging) involve constraints over the output space. By identifying these constraints, we can filter out infeasible solutions and build an accountable model.
To this end, we present a general integer linear programming (ILP) framework for mining constraints from data. We model the inference of structured output prediction as an ILP problem. Then, given the coefficients of the objective function and the corresponding solution, we mine the underlying constraints by estimating the outer and inner polytopes of the feasible set. We verify the proposed constraint mining algorithm in various synthetic and real-world applications and demonstrate that the proposed approach successfully identifies the feasible set at scale.
In particular, we show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules. We also demonstrate results on hierarchical multi-label classification and conduct a theoretical analysis on how close the mined constraints are from the ground truth.
@inproceedings{meng2020integer,
author = {Meng, Tao and Chang, Kai-Wei},
title = {An Integer Linear Programming Framework for Mining Constraints from Data},
booktitle = {ICML},
year = {2021}
}

Details
• #### Syntax-augmented Multilingual BERT for Cross-lingual Transfer

Wasi Ahmad, Haoran Li, Kai-Wei Chang, and Yashar Mehdad, in ACL, 2021.
Full Text Video Code Abstract BibTeX Details
In recent years, we have seen a colossal effort
in pre-training multilingual text encoders using large-scale corpora in many languages to
facilitate cross-lingual transfer learning. However, due to typological differences across languages, the cross-lingual transfer is challenging. Nevertheless, language syntax, e.g., syntactic dependencies, can bridge the typological gap. Previous works have shown that pretrained multilingual encoders, such as mBERT
(Devlin et al., 2019), capture language syntax, helping cross-lingual transfer. This work
shows that explicitly providing language syntax and training mBERT using an auxiliary
objective to encode the universal dependency
tree structure helps cross-lingual transfer. We
perform rigorous experiments on four NLP
tasks, including text classification, question answering, named entity recognition, and taskoriented semantic parsing. The experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks, such as PAWS-X and MLQA, by 1.4
and 1.6 points on average across all languages.
In the generalized transfer setting, the performance boosted significantly, with 3.9 and 3.1
points on average in PAWS-X and MLQA.
@inproceedings{ahmad2021syntax,
title = {Syntax-augmented Multilingual BERT for Cross-lingual Transfer},
author = {Ahmad, Wasi and Li, Haoran and Chang, Kai-Wei and Mehdad, Yashar},
booktitle = {ACL},
year = {2021}
}

Details
• #### Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention

Wasi Ahmad, Xiao Bai, Soomin Lee, and Kai-Wei Chang, in ACL, 2021.
Full Text Abstract BibTeX Details
In recent years, deep neural sequence-to-sequence framework has demonstrated promising results in keyphrase generation. However, processing long documents using such deep neural networks requires high computational resources. To reduce the computational cost, the documents are typically truncated before given as inputs. As a result, the models may miss essential points conveyed in a document. Moreover, most of the existing methods are either extractive (identify important phrases from the document) or generative (generate phrases word by word), and hence they do not benefit from the advantages of both modeling techniques. To address these challenges, we propose \emphSEG-Net, a neural keyphrase generation model that is composed of two major components, (1) a selector that selects the salient sentences in a document, and (2) an extractor-generator that jointly extracts and generates keyphrases from the selected sentences. SEG-Net uses a self-attentive architecture, known as, \emphTransformer as the building block with a couple of uniqueness. First, SEG-Net incorporates a novel \emphlayer-wise coverage attention to summarize most of the points discussed in the target document. Second, it uses an \emphinformed copy attention mechanism to encourage focusing on different segments of the document during keyphrase extraction and generation. Besides, SEG-Net jointly learns keyphrase generation and their part-of-speech tag prediction, where the later provides syntactic supervision to the former. The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin in both domains.
@inproceedings{ahmad2021select,
title = {Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention},
author = {Ahmad, Wasi and Bai, Xiao and Lee, Soomin and Chang, Kai-Wei},
booktitle = {ACL},
year = {2021}
}

Details
• #### Societal Biases in Language Generation: Progress and Challenges

Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng, in ACL, 2021.
Full Text Abstract BibTeX Details
Technology for language generation has advanced rapidly, spurred by advancements in pre-training large models on massive amounts of data and the need for intelligent agents to communicate in a natural manner. While techniques can effectively generate fluent text, they can also produce undesirable societal biases that can have a disproportionately negative impact on marginalized populations. Language generation presents unique challenges for biases in terms of direct user interaction and the structure of decoding techniques. To better understand these challenges, we present a survey on societal biases in language generation, focusing on how data and techniques contribute to biases and progress towards reducing biases. Motivated by a lack of studies on biases from decoding techniques, we also conduct experiments to quantify the effects of these techniques. By further discussing general trends and open challenges, we call to attention promising directions for research and the importance of fairness and inclusivity considerations for language generation applications.
@inproceedings{sheng2021societal,
title = {Societal Biases in Language Generation: Progress and Challenges},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Prem and Peng, Nanyun},
booktitle = {ACL},
year = {2021}
}

Details
• #### Intent Classification and Slot Filling for Privacy Policies

Wasi Ahmad, Jianfeng Chi, Tu Le, Thomas Norton, Yuan Tian, and Kai-Wei Chang, in ACL, 2021.
Full Text Video Code Abstract BibTeX Details
Understanding privacy policies is crucial for users as it empowers them to learn about the information that matters to them. Sentences written in a privacy policy document explain privacy practices, and the constituent text spans convey further specific information about that practice. We refer to predicting the privacy practice explained in a sentence as intent classification and identifying the text spans sharing specific information as slot filling. In this work, we propose PolicyIE, a corpus consisting of 5,250 intent and 11,788 slot annotations spanning 31 privacy policies of websites and mobile applications. PolicyIE corpus is a challenging benchmark with limited labeled examples reflecting the cost of collecting large-scale annotations. We present two alternative neural approaches as baselines: (1) formulating intent classification and slot filling as a joint sequence tagging and (2) modeling them as a sequence-to-sequence (Seq2Seq) learning task. Experiment results show that both approaches perform comparably in intent classification, while the Seq2Seq method outperforms the sequence tagging approach in slot filling by a large margin. Error analysis reveals the deficiency of the baseline approaches, suggesting room for improvement in future works. We hope the PolicyIE corpus will stimulate future research in this domain.
@inproceedings{ahmad2021intent,
title = {Intent Classification and Slot Filling for Privacy Policies},
author = {Ahmad, Wasi and Chi, Jianfeng and Le, Tu and Norton, Thomas and Tian, Yuan and Chang, Kai-Wei},
booktitle = {ACL},
year = {2021}
}

Details
• #### Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble

Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang, in ACL, 2021.
Full Text Code Abstract BibTeX Details
Although deep neural networks have achieved prominent performance on many NLP tasks, they are vulnerable to adversarial examples. We propose Dirichlet Neighborhood Ensemble (DNE), a randomized method for training a robust model to defense synonym substitutionbased attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models (e.g., BERT) for NLP applications. Through extensive experimentation, we demonstrate that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
@inproceedings{zhou2021defense,
title = {Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble},
author = {Zhou, Yi and Zheng, Xiaoqing and Hsieh, Cho-Jui and Chang, Kai-Wei and Huang, Xuanjing},
booktitle = {ACL},
year = {2021}
}

Details
• #### Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification

Yada Pruksachatkun, Satyapriya Krishna, Jwala Dhamala, Rahul Gupta, and Kai-Wei Chang, in ACL-Finding, 2021.
Full Text Code Abstract BibTeX Details
Existing bias mitigation methods to reduce disparities in model outcomes across cohorts have focused on data augmentation, debiasing model embeddings, or adding fairness-based optimization objectives during training. Separately, certified word substitution robustness methods have been developed to decrease the impact of spurious features and synonym substitutions on model predictions. While their end goals are different, they both aim to encourage models to make the same prediction for certain changes in the input. In this paper, we investigate the utility of certified word substitution robustness methods to improve equality of odds and equality of opportunity on multiple text classification tasks. We observe that certified robustness methods improve fairness, and using both robustness and bias mitigation methods in training results in an improvement in both fronts.
@inproceedings{pruksachatkun2021robustness,
title = {Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification},
author = {Pruksachatkun, Yada and Krishna, Satyapriya and Dhamala, Jwala and Gupta, Rahul and Chang, Kai-Wei},
booktitle = {ACL-Finding},
year = {2021}
}

Details
• #### Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, and Kai-Wei Chang, in ACL-Finding (short), 2021.
Full Text Abstract BibTeX Details
Is it possible to use natural language to intervene in a model’s behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model’s unethical behavior by communicating context-specific principles of ethics and equity to it. To this end, we build upon recent methods for quantifying a system’s social stereotypes, augmenting them with different kinds of ethical interventions and the desired model behavior under such interventions. Our zero-shot evaluation finds that even today’s powerful neural language models are extremely poor ethical-advice takers, that is, they respond surprisingly little to ethical interventions even though these interventions are stated as simple sentences. Few-shot learning improves model behavior but remains far from the desired outcome, especially when evaluated for various types of generalization. Our new task thus poses a novel language understanding challenge for the community.
@inproceedings{zhao2021ethical,
title = {Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?},
author = {Zhao, Jieyu and Khashabi, Daniel and Khot, Tushar and Sabharwal, Ashish and Chang, Kai-Wei},
booktitle = {ACL-Finding (short)},
year = {2021}
}

Details
• #### Aggression, escalation, and other latent themes in legal intervention deaths of non-Hispanic Black and White men: Results from the 2003-2017 NVDRS

Alina Arseniev-Koehler, Jacob Foster, Vickie Mays, Kai-Wei Chang, and Susan Cochran, in American Journal of Public Health, 2021.
Full Text Abstract BibTeX Details
Objectives. To investigate racial/ethnic differences in legal intervention-related deaths using state-of-theart topic modeling of law enforcement and coroner text summaries drawn from the 2003-2017 US National Violent Death Reporting System (NVDRS). Methods. Employing advanced topic modeling, we identified 8 topics consistent with dangerousness in death incidents in the NVDRS death narratives written by public health workers (PHWs). Using logistic regression, we then evaluated racial/ethnic differences in PHW-coded variables and narrative topics among 4981 males killed by legal intervention, while adjusting for age, county-level characteristics, and year. Results. Black, as compared with White, decedents were younger and their deaths were less likely to include PHW-coded mental health or substance use histories, weapon use, or positive toxicology for alcohol or psychoactive drugs, but more likely to include gangs-as-an-incident-precipitant coding. Topic modeling revealed less frequent thematic representation of physical aggression or escalation but more of gangs or criminal networks among Black versus White decedents. Conclusions. While Black males were more likely to be victims of legal intervention deaths, PHW-coded variables in the NVDRS and death narratives suggest lower threat profiles among Black versus similar White decedents. The source of this greater risk remains undetermined.
@inproceedings{arseniev2021aggression,
title = {Aggression, escalation, and other latent themes in legal intervention deaths of non-Hispanic Black and White men: Results from the 2003-2017 NVDRS},
author = {Arseniev-Koehler, Alina and Foster, Jacob and Mays, Vickie and Chang, Kai-Wei and Cochran, Susan},
booktitle = {American Journal of Public Health},
year = {2021}
}

Details
• #### Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions

Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, and Kai-Wei Chang, in NAACL, 2021.
Full Text Video Abstract BibTeX Details
Pre-trained contextual vision-and-language (V&L) models have brought impressive performance improvement on various benchmarks. However, the paired text-image data required for pre-training are hard to collect and scale up. We investigate if a strong V&L representation model can be learned without text-image pairs. We propose Weakly-supervised VisualBERT with the key idea of conducting "mask-and-predict" pre-training on language-only and image-only corpora. Additionally, we introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. Evaluation on four V&L benchmarks shows that Weakly-supervised VisualBERT achieves similar performance with a model pre-trained with paired data. Besides, pre-training on more image-only data further improves a model that already has access to aligned data, suggesting the possibility of utilizing billions of raw images available to enhance V&L models.
@inproceedings{li2021unsupervised,
author = {Li, Liunian Harold and You, Haoxuan and Wang, Zhecan and Zareian, Alireza and Chang, Shih-Fu and Chang, Kai-Wei},
title = {Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions},
booktitle = {NAACL},
presentation_id = {https://underline.io/events/122/sessions/4269/lecture/19725-unsupervised-vision-and-language-pre-training-without-parallel-images-and-captions},
year = {2021}
}

Details
• #### Evaluating the Values of Sources in Transfer Learning

Md Rizwan Parvez and Kai-Wei Chang, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Transfer learning that adapts a model trained on data-rich sources to low-resource targets has been widely applied in natural language processing (NLP). However, when training a transfer model over multiple sources, not every source is equally useful for the target. To better transfer a model, it is essential to understand the values of the sources. In this paper, we develop SEAL-Shap, an efficient source valuation framework for quantifying the usefulness of the sources (e.g., domains/languages) in transfer learning based on the Shapley value method. Experiments and comprehensive analyses on both cross-domain and cross-lingual transfers demonstrate that our framework is not only effective in choosing useful transfer sources but also the source values match the intuitive source-target similarity.
@inproceedings{parvez2021evaluating,
title = {Evaluating the Values of Sources in Transfer Learning},
author = {Parvez, Md Rizwan and Chang, Kai-Wei},
booktitle = {NAACL},
presentation_id = {https://underline.io/events/122/sessions/4261/lecture/19707-evaluating-the-values-of-sources-in-transfer-learning},
year = {2021}
}

Details
• #### Unified Pre-training for Program Understanding and Generation

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Code summarization nd generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART’s effectiveness in program understanding. Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow (e.g., if block inside an else block is equivalent to else if block) that are crucial to program semantics and thus excels even with limited annotations.
@inproceedings{ahmad2021unified,
title = {Unified Pre-training for Program Understanding and Generation},
author = {Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
booktitle = {NAACL},
presentation_id = {https://underline.io/events/122/sessions/4197/lecture/20024-unified-pre-training-for-program-understanding-and-generation},
year = {2021}
}

Details
• #### "Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses

Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Ad hominem attacks are those that target some feature of a person’s character instead of the position the person is maintaining. These attacks are harmful because they propagate implicit biases and diminish a person’s credibility. Since dialogue systems respond directly to user input, it is important to study ad hominems in dialogue responses. To this end, we propose categories of ad hominems, compose an annotated dataset, and build a classifier to analyze human and dialogue system responses to English Twitter posts. We specifically compare responses to Twitter topics about marginalized communities (#BlackLivesMatter, #MeToo) versus other topics (#Vegan, #WFH), because the abusive language of ad hominems could further amplify the skew of power away from marginalized populations. Furthermore, we propose a constrained decoding technique that uses salient n-gram similarity as a soft constraint for top-k sampling to reduce the amount of ad hominems generated. Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can use constrained decoding techniques to reduce ad hominems in generated dialogue responses.
@inproceedings{sheng2021nice,
title = {"Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses},
booktitle = {NAACL},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Prem and Peng, Nanyun},
year = {2021}
}

Details
• #### Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation

Chong Zhang, Jieyu Zhao, Huan Zhang, Kai-Wei Chang, and Cho-Jui Hsieh, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Robustness and counterfactual bias are usually evaluated on a test dataset. However, are these evaluations robust? If the test dataset is perturbed slightly, will the evaluation results keep the same? In this paper, we propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset. The framework first perturbs the test dataset to construct abundant natural sentences similar to the test data, and then diagnoses the prediction change regarding a single-word substitution. We apply this framework to study two perturbation-based approaches that are used to analyze models’ robustness and counterfactual bias in English. (1) For robustness, we focus on synonym substitutions and identify vulnerable examples where prediction can be altered. Our proposed attack attains high success rates (96.0%-99.8%) in finding vulnerable examples on both original and robustly trained CNNs and Transformers. (2) For counterfactual bias, we focus on substituting demographic tokens (e.g., gender, race) and measure the shift of the expected prediction among constructed sentences. Our method is able to reveal the hidden model biases not directly shown in the test dataset.
@inproceedings{zhang2021double,
title = {	Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation},
booktitle = {NAACL},
author = {Zhang, Chong and Zhao, Jieyu and Zhang, Huan and Chang, Kai-Wei and Hsieh, Cho-Jui},
year = {2021},
presentation_id = {https://underline.io/events/122/sessions/4229/lecture/19609-double-perturbation-on-the-robustness-of-robustness-and-counterfactual-bias-evaluation}
}

Details
• #### Adapting Coreference Resolution for Processing Violent Death Narratives

Ankith Uppunda, Susan Cochran, Jacob Foster, Alina Arseniev-Koehler, Vickie Mays, and Kai-Wei Chang, in NAACL (short), 2021.
Full Text Video Abstract BibTeX Details
Coreference resolution is an important component in analyzing narrative text from administrative data (e.g., clinical or police sources). However, existing coreference models trained on general language corpora suffer from poor transferability due to domain gaps, especially when they are applied to gender-inclusive data with lesbian, gay, bisexual, and transgender (LGBT) individuals. In this paper, we analyzed the challenges of coreference resolution in an exemplary form of administrative text written in English: violent death narratives from the USA’s Centers for Disease Control’s (CDC) National Violent Death Reporting System. We developed a set of data augmentation rules to improve model performance using a probabilistic data programming framework. Experiments on narratives from an administrative database, as well as existing gender-inclusive coreference datasets, demonstrate the effectiveness of data augmentation in training coreference models that can better handle text data about LGBT individuals.
@inproceedings{uppunda2021adapting,
title = {Adapting Coreference Resolution for Processing Violent Death Narratives},
author = {Uppunda, Ankith and Cochran, Susan and Foster, Jacob and Arseniev-Koehler, Alina and Mays, Vickie and Chang, Kai-Wei},
booktitle = {NAACL (short)},
year = {2021}
}

Details
• #### Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models

James Y. Huang, Kuan-Hao Huang, and Kai-Wei Chang, in NAACL (short), 2021.
Full Text Video Code Abstract BibTeX Details
Pre-trained language models have achieved huge success on a wide range of NLP tasks. However, contextual representations from pre-trained models contain entangled semantic and syntactic information, and therefore cannot be directly used to derive useful semantic sentence embeddings for some tasks. Paraphrase pairs offer an effective way of learning the distinction between semantics and syntax, as they naturally share semantics and often vary in syntax. In this work, we present ParaBART, a semantic sentence embedding model that learns to disentangle semantics and syntax in sentence embeddings obtained by pre-trained language models. ParaBART is trained to perform syntax-guided paraphrasing, based on a source sentence that shares semantics with the target paraphrase, and a parse tree that specifies the target syntax. In this way, ParaBART learns disentangled semantic and syntactic representations from their respective inputs with separate encoders. Experiments in English show that ParaBART outperforms state-of-the-art sentence embedding models on unsupervised semantic similarity tasks. Additionally, we show that our approach can effectively remove syntactic information from semantic sentence embeddings, leading to better robustness against syntactic variation on downstream semantic tasks.
@inproceedings{huang2021disentangling,
title = {Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models},
author = {Huang, James Y. and Huang, Kuan-Hao and Chang, Kai-Wei},
booktitle = {NAACL (short)},
presentation_id = {https://underline.io/events/122/sessions/4151/lecture/19910-disentangling-semantics-and-syntax-in-sentence-embeddings-with-pre-trained-language-models},
year = {2021}
}

Details
• #### BOLD: Dataset and metrics for measuring biases in open-ended language generation

Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta, in FAccT, 2021.
Full Text Code Abstract BibTeX Details
Recent advances in deep learning techniques have enabled machines to generate cohesive open-ended text when prompted with a sequence of words as context. While these models now empower many downstream applications from conversation bots to automatic storytelling, they have been shown to generate texts that exhibit social biases. To systematically study and benchmark social biases in open-ended language generation, we introduce the Bias in Open-Ended Language Generation Dataset (BOLD), a large-scale dataset that consists of 23,679 English text generation prompts for bias benchmarking across five domains: profession, gender, race, religion, and political ideology. We also propose new automated metrics for toxicity, psycholinguistic norms, and text gender polarity to measure social biases in open-ended text generation from multiple angles. An examination of text generated from three popular language models reveals that the majority of these models exhibit a larger social bias than human-written Wikipedia text across all domains. With these results we highlight the need to benchmark biases in open-ended language generation and caution users of language generation models on downstream tasks to be cognizant of these embedded prejudices.
@inproceedings{dhamala2021bold,
author = {Dhamala, Jwala and Sun, Tony and Kumar, Varun and Krishna, Satyapriya and Pruksachatkun, Yada and Chang, Kai-Wei and Gupta, Rahul},
title = {BOLD: Dataset and metrics for measuring biases in open-ended language generation},
booktitle = {FAccT},
year = {2021}
}

Details
• #### Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs

Kuan-Hao Huang and Kai-Wei Chang, in EACL, 2021.
Full Text Slides Poster Code Abstract BibTeX Details
Paraphrase generation plays an essential role in natural language process (NLP), and it has many downstream applications. However, training supervised paraphrase models requires many annotated paraphrase pairs, which are usually costly to obtain. On the other hand, the paraphrases generated by existing unsupervised approaches are usually syntactically similar to the source sentences and are limited in diversity. In this paper, we demonstrate that it is possible to generate syntactically various paraphrases without the need for annotated paraphrase pairs. We propose Syntactically controlled Paraphrase Generator (SynPG), an encoder-decoder based model that learns to disentangle the semantics and the syntax of a sentence from a collection of unannotated texts. The disentanglement enables SynPG to control the syntax of output paraphrases by manipulating the embedding in the syntactic space. Extensive experiments using automatic metrics and human evaluation show that SynPG performs better syntactic control than unsupervised baselines, while the quality of the generated paraphrases is competitive. We also demonstrate that the performance of SynPG is competitive or even better than supervised models when the unannotated data is large. Finally, we show that the syntactically controlled paraphrases generated by SynPG can be utilized for data augmentation to improve the robustness of NLP models.
@inproceedings{huang2021generating,
author = {Huang, Kuan-Hao and Chang, Kai-Wei},
title = {Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs},
booktitle = {EACL},
year = {2021}
}

Details
• #### Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference

Yichao Zhou, Yu Yan, Rujun Han, J. Harry Caufield, Kai-Wei Chang, Yizhou Sun, Peipei Ping, and Wei Wang, in AAAI, 2021.
Full Text Code Abstract BibTeX Details
There  has  been  a  steady  need  in  the  medical  community to  precisely  extract  the  temporal  relations  between  clinical events. In particular, temporal information can facilitate a variety of downstream applications such as case report retrieval and medical question answering. However, existing methods either require expensive feature engineering or are incapable of  modeling  the  global  relational  dependencies  among  theevents. In this paper, we propose Clinical Temporal Relation Exaction  with  Probabilistic  Soft  Logic  Regularization  and Global Inference (CTRL-PG), a novel method to tackle the problem at the document level. Extensive experiments on two benchmark datasets, I2B2-2012 and TB-Dense, demonstrate that CTRL-PG significantly  outperforms  baseline  methodsfor temporal relation extraction.
@inproceedings{zhou2021clinical,
author = {Zhou, Yichao and Yan, Yu and Han, Rujun and Caufield, J. Harry and Chang, Kai-Wei and Sun, Yizhou and Ping, Peipei and Wang, Wei},
title = {Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference},
booktitle = {AAAI},
year = {2021}
}

Details
• #### GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction

Wasi Ahmad, Nanyun Peng, and Kai-Wei Chang, in AAAI, 2021.
Full Text Code Abstract BibTeX Details
Prevalent approaches in cross-lingual relation and event extraction use graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic representations such that models trained on one language can be applied to other languages. However, GCNs lack in modeling long-range dependencies or disconnected words in the dependency tree. To address this challenge, we propose to utilize the self-attention mechanism where we explicitly fuse structural information to learn the dependencies between words at different syntactic distances. We introduce GATE, a \bf Graph \bf Attention \bf Transformer \bf Encoder, and test its cross-lingual transferability on relation and event extraction tasks. We perform rigorous experiments on the widely used ACE05 dataset that includes three typologically different languages: English, Chinese, and Arabic. The evaluation results show that GATE outperforms three recently proposed methods by a large margin. Our detailed analysis reveals that due to the reliance on syntactic dependencies, GATE produces robust representations that facilitate transfer across languages.
@inproceedings{ahmad2021gate,
author = {Ahmad, Wasi and Peng, Nanyun and Chang, Kai-Wei},
title = {GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction},
booktitle = {AAAI},
year = {2021}
}

Details

## 2020

• #### Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh, in NeurIPS, 2020.
Full Text Code Abstract BibTeX Details
Linear relaxation based perturbation analysis (LiRPA) for neural networks, which computes provable linear bounds of output neurons given a certain amount of input perturbation, has become a core component in robustness verification and certified defense. The majority of LiRPA-based methods only consider simple feed-forward networks and it needs particular manual derivations and implementations when extended to other architectures. In this paper, we develop an automatic framework to enable perturbation analysis on any neural network structures, by generalizing exiting LiRPA algorithms such as CROWN to operate on general computational graphs. The flexibility, differentiability and ease of use of our framework allow us to obtain state-of-the-art results on LiRPA based certified defense on fairly complicated networks like DenseNet, ResNeXt and Transformer that are not supported by prior work. Our framework also enables loss fusion, a technique that significantly reduces the computational complexity of LiRPA for certified defense. For the first time, we demonstrate LiRPA based certified defense on Tiny ImageNet and Downscaled ImageNet where previous approaches cannot scale to due to the relatively large number of classes. Our work also yields an open-source library for the community to apply LiRPA to areas beyond certified defense without much LiRPA expertise, e.g., we create a neural network with a provably flat optimization landscape. Our open source library is available at https://github.com/KaidiXu/auto_LiRPA
@inproceedings{xu2020provable,
author = {Xu, Kaidi and Shi, Zhouxing and Zhang, Huan and Wang, Yihan and Chang, Kai-Wei and Huang, Minlie and Kailkhura, Bhavya and Lin, Xue and Hsieh, Cho-Jui},
title = {Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs},
booktitle = {NeurIPS},
year = {2020}
}

Details
• #### LOGAN: Local Group Bias Detection by Clustering

Jieyu Zhao and Kai-Wei Chang, in EMNLP (short), 2020.
Full Text Code Abstract BibTeX Details
Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.
@inproceedings{zhao2020logan,
author = {Zhao, Jieyu and Chang, Kai-Wei},
title = {LOGAN: Local Group Bias Detection by Clustering},
booktitle = {EMNLP (short)},
presentation_id = {https://virtual.2020.emnlp.org/paper_main.2886.html},
year = {2020}
}

Details
• #### Cross-Lingual Dependency Parsing by POS-Guided Word Reordering

Lu Liu, Yi Zhou, Jianhan Xu, Xiaoqing Zheng, Kai-Wei Chang, and Xuanjing Huang, in EMNLP-Finding, 2020.
Full Text Abstract BibTeX Details
We propose a novel approach to cross-lingual dependency parsing based on word reordering. The words in each sentence of a source language corpus are rearranged to meet the word order in a target language under the guidance of a part-of-speech based language model (LM). To obtain the highest reordering score under the LM, a population-based optimization algorithm and its genetic operators are designed to deal with the combinatorial nature of such word reordering. A parser trained on the reordered corpus then can be used to parse sentences in the target language. We demonstrate through extensive experimentation that our approach achieves better or comparable results across 25 target languages (1.73% increase in average), and outperforms a baseline by a significant margin on the languages that are greatly different from the source one. For example, when transferring the English parser to Hindi and Latin, our approach outperforms the baseline by 15.3% and 6.7% respectively.
@inproceedings{liu2020cross-lingual,
author = {Liu, Lu and Zhou, Yi and Xu, Jianhan and Zheng, Xiaoqing and Chang, Kai-Wei and Huang, Xuanjing},
title = {Cross-Lingual Dependency Parsing by POS-Guided Word Reordering},
booktitle = {EMNLP-Finding},
year = {2020}
}

Details
• #### Towards Controllable Biases in Language Generation

Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP-Finding, 2020.
Full Text Code Abstract BibTeX Details
We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We then analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics. The former scenario enables us to detect the types of biases present in the model. Specifically, we show the effectiveness of our approach at facilitating bias analysis by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics. The second scenario is useful for mitigating biases in downstream applications such as dialogue generation. In our experiments, the mitigation technique proves to be effective at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.
@inproceedings{sheng2020towards,
title = {Towards Controllable Biases in Language Generation},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
booktitle = {EMNLP-Finding},
year = {2020}
}

Details
• #### PolicyQA: A Reading Comprehension Dataset for Privacy Policies

Wasi Ahmad, Jianfeng Chi, Yuan Tian, and Kai-Wei Chang, in EMNLP-Finding (short), 2020.
Full Text Code Abstract BibTeX Details
Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
@inproceedings{ahmad2020policyqa,
author = {Ahmad, Wasi and Chi, Jianfeng and Tian, Yuan and Chang, Kai-Wei},
title = {PolicyQA: A Reading Comprehension Dataset for Privacy Policies},
booktitle = {EMNLP-Finding (short)},
year = {2020}
}

Details
• #### Generating Sports News from Live Commentary: A Chinese Dataset for Sports Game Summarization

Kuan-Hao Huang, Chen Li, and Kai-Wei Chang, in AACL (short), 2020.
Full Text BibTeX Details
@inproceedings{huang2020generating,
author = {Huang, Kuan-Hao and Li, Chen and Chang, Kai-Wei},
title = {Generating Sports News from Live Commentary: A Chinese Dataset for Sports Game Summarization},
booktitle = {AACL (short)},
year = {2020}
}

Details
• #### GPT-GNN: Generative Pre-Training of Graph Neural Networks

Ziniu Hu, Yuxiao Dong, Kuansan Wang, Kai-Wei Chang, and Yizhou Sun, in KDD, 2020.
Full Text Video Code Abstract BibTeX Details
Graph neural networks (GNNs) have been demonstrated to besuccessful in modeling graph-structured data. However, training GNNs requires abundant task-specific labeled data, which is often arduously expensive to obtain. One effective way to reduce labeling effort is to pre-train an expressive GNN model on unlabelled data with self-supervision and then transfer the learned knowledge to downstream models. In this paper, we present the GPT-GNN’s framework to initialize GNNs by generative pre-training. GPT-GNN introduces a self-supervised attributed graph generation task to pre-train a GNN,which allows the GNN to capture the intrinsic structural and semantic properties of the graph. We factorize the likelihood of graph generation into two components: 1) attribute generation, and 2) edgegeneration. By modeling both components, GPT-GNN captures the inherent dependency between node attributes and graph structure during the generative process. Comprehensive experiments on thebillion-scale academic graph and Amazon recommendation data demonstrate that GPT-GNN significantly outperforms state-of-the-art base GNN models without pre-training by up to 9.1% across different downstream tasks.
@inproceedings{hu2020gptgnn,
author = {Hu, Ziniu and Dong, Yuxiao and Wang, Kuansan and Chang, Kai-Wei and Sun, Yizhou},
title = {GPT-GNN: Generative Pre-Training of Graph Neural Networks},
booktitle = {KDD},
slide_url = {https://acbull.github.io/pdf/gpt.pptx},
year = {2020}
}

Details
• #### On the Robustness of Language Encoders against Grammatical Errors

Fan Yin, Quanyu Long, Tao Meng, and Kai-Wei Chang, in ACL, 2020.
Full Text Slides Video Code Abstract BibTeX Details
We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors.
@inproceedings{yin2020robustness,
author = {Yin, Fan and Long, Quanyu and Meng, Tao and Chang, Kai-Wei},
title = {On the Robustness of Language Encoders against Grammatical Errors},
booktitle = {ACL},
presentation_id = {https://virtual.acl2020.org/paper_main.310.html},
year = {2020}
}

Details
• #### Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini, Kai-Wei Chang, and Ahmed Hassan Awadallah, in ACL, 2020.
Full Text Slides Video Abstract BibTeX Details
Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks.
@inproceedings{zhao2020gender,
author = {Zhao, Jieyu and Mukherjee, Subhabrata and Hosseini, Saghar and Chang, Kai-Wei and Awadallah, Ahmed Hassan},
title = {Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer},
booktitle = {ACL},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.260.html}
}

Details
• #### SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics

Da Yin, Tao Meng, and Kai-Wei Chang, in ACL, 2020.
Full Text Slides Video Code Abstract BibTeX Details
We propose SentiBERT, a variant of BERT that effectively captures compositional sentiment semantics. The model incorporates contextualized representation with binary constituency parse tree to capture semantic composition. Comprehensive experiments demonstrate that SentiBERT achieves competitive performance on phrase-level sentiment classification. We further demonstrate that the sentiment composition learned from the phrase-level annotations on SST can be transferred to other sentiment analysis tasks as well as related tasks, such as emotion classification tasks. Moreover, we conduct ablation studies and design visualization methods to understand SentiBERT. We show that SentiBERT is better than baseline approaches in capturing negation and the contrastive relation and model the compositional sentiment semantics.
@inproceedings{yin2020sentibert,
author = {Yin, Da and Meng, Tao and Chang, Kai-Wei},
title = {SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics},
booktitle = {ACL},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.341.html}
}

Details
• #### "The Boating Store Had Its Best Sail Ever": Pronunciation-attentive Contextualized Pun Recognition

Yichao Zhou, Jyun-Yu Jiang, Jieyu Zhao, Kai-Wei Chang, and Wei Wang, in ACL, 2020.
Full Text Slides Video Code Abstract BibTeX Details
Humor plays an important role in human languages and it is essential to model humor when building intelligence systems. Among different forms of humor, puns perform wordplay for humorous effects by employing words with double entendre and high phonetic similarity. However, identifying and modeling puns are challenging as puns usually involved implicit semantic or phonological tricks. In this paper, we propose Pronunciation-attentive Contextualized Pun Recognition (PCPR) to perceive human humor, detect if a sentence contains puns and locate them in the sentence. PCPR derives contextualized representation for each word in a sentence by capturing the association between the surrounding context and its corresponding phonetic symbols. Extensive experiments are conducted on two benchmark datasets. Results demonstrate that the proposed approach significantly outperforms the state-of-the-art methods in pun detection and location tasks. In-depth analyses verify the effectiveness and robustness of PCPR.
@inproceedings{zhou2020boating,
author = {Zhou, Yichao and Jiang, Jyun-Yu and Zhao, Jieyu and Chang, Kai-Wei and Wang, Wei},
title = {"The Boating Store Had Its Best Sail Ever": Pronunciation-attentive Contextualized Pun Recognition},
booktitle = {ACL},
presentation_id = {https://virtual.acl2020.org/paper_main.75.html},
year = {2020}
}

Details
• #### Towards Understanding Gender Bias in Relation Extraction

Andrew Gaut, Tony Sun, Shirlyn Tang, Yuxin Huang, Jing Qian, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang, in ACL, 2020.
Full Text Abstract BibTeX Details
Recent developments in Neural Relation Extraction (NRE) have made significant strides towards automated knowledge base construction. While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to evaluate social biases exhibited in NRE systems. In this paper, we create WikiGenderBias, a distantly supervised dataset composed of over 45,000 sentences including a 10% human annotated test set for the purpose of analyzing gender bias in relation extraction systems. We find that when extracting spouse and hypernym (i.e., occupation) relations, an NRE system performs differently when the gender of the target entity is different. However, such disparity does not appear when extracting relations such as birth date or birth place. We also analyze two existing bias mitigation techniques, word embedding debiasing and data augmentation. Unfortunately, due to NRE models relying heavily on surface level cues, we find that existing bias mitigation approaches have a negative effect on NRE. Our analysis lays groundwork for future quantifying and mitigating bias in relation extraction.
@inproceedings{gaut2020towards,
author = {Gaut, Andrew and Sun, Tony and Tang, Shirlyn and Huang, Yuxin and Qian, Jing and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Belding, Elizabeth and Chang, Kai-Wei and Wang, William Yang},
title = {Towards Understanding Gender Bias in Relation Extraction},
booktitle = {ACL},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.265.html}
}

Details
• #### A Transformer-based Approach for Source Code Summarization

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in ACL (short), 2020.
Full Text Slides Video Code Abstract BibTeX Details
Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range dependencies is crucial. To learn code representation for summarization, we explore the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies. In this work, we show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin. We perform extensive analysis and ablation studies that reveal several important findings, e.g., the absolute encoding of source code tokens’ position hinders, while relative encoding significantly improves the summarization performance. We have made our code publicly available to facilitate future research.
@inproceedings{ahmad2020transformer,
author = {Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
title = {A Transformer-based Approach for Source Code Summarization},
booktitle = {ACL (short)},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.449.html}
}

Details
• #### What Does BERT with Vision Look At?

Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in ACL (short), 2020.
Full Text Slides Video Code Abstract BibTeX Details
Pre-trained visually grounded language models such as ViLBERT, LXMERT, and UNITER have achieved significant performance improvement on vision-and-language tasks but what they learn during pre-training remains unclear. In this work, we demonstrate that certain attention heads of a visually grounded language model actively ground elements of language to image regions. Specifically, some heads can map entities to image regions, performing the task known as entity grounding. Some heads can even detect the syntactic relations between non-entity words and image regions, tracking, for example, associations between verbs and regions corresponding to their arguments. We denote this ability as \emphsyntactic grounding. We verify grounding both quantitatively and qualitatively, using Flickr30K Entities as a testbed.
@inproceedings{li2020what,
author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
title = {What Does BERT with Vision Look At?},
booktitle = {ACL (short)},
presentation_id = {https://virtual.acl2020.org/paper_main.469.html},
year = {2020}
}

Details
• #### Mitigating Gender Bias Amplification in Distribution by Posterior Regularization

Shengyu Jia, Tao Meng, Jieyu Zhao, and Kai-Wei Chang, in ACL (short), 2020.
Full Text Slides Video Code Abstract BibTeX Details
Advanced machine  learning  techniques  have boosted  the  performance  of  natural  language processing.  Nevertheless, recent studies, e.g., Zhao et al. (2017) show that these techniques inadvertently capture the societal bias hiddenin the corpus and further amplify it.  However,their analysis is conducted only on models’ top predictions.   In this paper,  we investigate thegender  bias  amplification  issue  from  the  distribution perspective and demonstrate that thebias is amplified in the view of predicted probability distribution over labels. We further propose a bias mitigation approach based on posterior regularization.   With little performance loss,  our method can almost remove the bias amplification  in  the  distribution. Our study sheds the light on understanding the bias amplification.
@inproceedings{jia2020mitigating,
author = {Jia, Shengyu and Meng, Tao and Zhao, Jieyu and Chang, Kai-Wei},
title = {Mitigating Gender Bias Amplification in Distribution by Posterior Regularization},
booktitle = {ACL (short)},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.264.html}
}

Details
• #### Robustness Verification for Transformers

Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, and Cho-Jui Hsieh, in ICLR, 2020.
Full Text Video Code Abstract BibTeX Details
Robustness verification that aims to formally certify the prediction behavior of
neural networks has become an important tool for understanding the behavior of
a given model and for obtaining safety guarantees. However, previous methods
are usually limited to relatively simple neural networks. In this paper, we consider the robustness verification problem for Transformers. Transformers have
complex self-attention layers that pose many challenges for verification, including
cross-nonlinearity and cross-position dependency, which have not been discussed
in previous work. We resolve these challenges and develop the first verification
algorithm for Transformers. The certified robustness bounds computed by our
method are significantly tighter than those by naive Interval Bound Propagation.
These bounds also shed light on interpreting Transformers as they consistently
reflect the importance of words in sentiment analysis.
@inproceedings{shi2020robustness,
author = {Shi, Zhouxing and Zhang, Huan and Chang, Kai-Wei and Huang, Minlie and Hsieh, Cho-Jui},
title = {Robustness Verification for Transformers},
booktitle = {ICLR},
year = {2020}
}

Details

## 2019

• #### Distributed Block-diagonal Approximation Methods for Regularized Empirical Risk Minimization

Ching-pei Lee and Kai-Wei Chang, in Machine Learning Journal, 2019.
Full Text Code Abstract BibTeX Details
Designing distributed algorithms for empirical risk minimization (ERM) has become an active research topic in recent years because of the practical need to deal with the huge volume of data. In this paper, we propose a general framework for training an ERM model via solving its dual problem in parallel over multiple machines. Our method provides a versatile approach for many large-scale machine learning problems, including linear binary/multi-class classification, regression, and structured prediction. Comparing with existing approaches, we show that our method has faster convergence under weaker conditions both theoretically and empirically.
@inproceedings{LD17,
author = {Lee, Ching-pei and Chang, Kai-Wei},
title = {Distributed Block-diagonal Approximation Methods for Regularized Empirical Risk Minimization},
booktitle = {Machine Learning Journal},
year = {2019}
}

Details
• #### Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages

Wasi Ahmad, Zhisong Zhang, Xuezhe Ma, Kai-Wei Chang, and Nanyun Peng, in CoNLL, 2019.
Full Text Poster Code Abstract BibTeX Details
Cross-lingual transfer learning has become an important weapon to battle the unavailability of annotated resources for low-resource languages.  One of the fundamental techniques to transfer across languages is learning language-agnostic representations, in the form of word embeddings or contextual encodings. In this work, we propose to leverage unannotated sentences from auxiliary languages to help learning language-agnostic representations  Specifically, we explore adversarial training for learning contextual encoders that produce invariant representations across languages to facilitate cross-lingual transfer. We conduct experiments on cross-lingual dependency parsing where we train a dependency parser on a source language and transfer it to a wide range of target languages.  Experiments on 28 target languages demonstrate that adversarial training significantly improves the overall transfer performances under several different settings.  We conduct a careful analysis to evaluate the language-agnostic representations resulted from adversarial training.
@inproceedings{ahmad2019crosslingual,
author = {Ahmad, Wasi and Zhang, Zhisong and Ma, Xuezhe and Chang, Kai-Wei and Peng, Nanyun},
title = {  Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages},
booktitle = {CoNLL},
year = {2019}
}

Details
• #### Learning to Represent Bilingual Dictionaries

Muhao Chen, Yingtao Tian, Haochen Chen, Kai-Wei Chang, Steve Skiena, and Carlo Zaniolo, in CoNLL, 2019.
Full Text Abstract BibTeX Details
Bilingual word embeddings have been widely used to capture the correspondence of lexical semantics in different human languages. However, the cross-lingual correspondence between sentences and words is less studied, despite that this correspondence can significantly benefit many applications such as cross-lingual semantic search and textual inference. To bridge this gap, we propose a neural embedding model that leverages bilingual dictionaries. The proposed model is trained to map the lexical definitions to the cross-lingual target words, for which we explore with different sentence encoding techniques. To enhance the learning process on limited resources, our model adopts several critical learning strategies, including multi-task learning on different bridges of languages, and joint learning of the dictionary model with a bilingual word embedding model. We conduct experiments on two new tasks. In the cross-lingual reverse dictionary retrieval task, we demonstrate that our model is capable of comprehending bilingual concepts based on descriptions, and the proposed learning strategies are effective. In the bilingual paraphrase identification task, we show that our model effectively associates sentences in different languages via a shared embedding space, and outperforms existing approaches in identifying bilingual paraphrases.
@inproceedings{chen2019leanring,
author = {Chen, Muhao and Tian, Yingtao and Chen, Haochen and Chang, Kai-Wei and Skiena, Steve and Zaniolo, Carlo},
title = { Learning to Represent Bilingual Dictionaries},
booktitle = {CoNLL},
year = {2019}
}

Details
• #### Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing

Tao Meng, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2019.
Full Text Poster Code Abstract BibTeX Details
Prior work on cross-lingual dependency parsing often focuses on capturing the commonalities between source and target languages and overlooks the potential of leveraging linguistic properties of the languages to facilitate the transfer. In this paper, we show that weak supervisions of linguistic knowledge for the target languages can improve a cross-lingual graph-based dependency parser substantially. Specifically, we explore several types of corpus linguistic statistics and compile them into corpus-wise constraints to guide the inference process during the test time. We adapt two techniques, Lagrangian relaxation and posterior regularization, to conduct inference with corpus-statistics constraints. Experiments show that the Lagrangian relaxation and posterior regularization inference improve the performances on 15 and 17 out of 19 target languages, respectively. The improvements are especially significant for target languages that have different word order features from the source language.
@inproceedings{meng2019target,
author = {Meng, Tao and Peng, Nanyun and Chang, Kai-Wei},
title = {Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing},
booktitle = {EMNLP},
year = {2019}
}

Details
• #### Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification

Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, and Wei Wang, in EMNLP, 2019.
Full Text Code Abstract BibTeX Details
Adversarial attacks against machine learning models have threatened various real-world applications such as spam filtering and sentiment analysis. In this paper, we propose a novel framework, learning to DIScriminate Perturbations (DISP), to identify and adjust malicious perturbations, thereby blocking adversarial attacks for text classification models. To identify adversarial attacks, a perturbation discriminator validates how likely a token in the text is perturbed and provides a set of potential perturbations. For each potential perturbation, an embedding estimator learns to restore the embedding of the original word based on the context and a replacement token is chosen based on approximate kNN search. DISP can block adversarial attacks for any NLP model without modifying the model structure or training procedure. Extensive experiments on two benchmark datasets demonstrate that DISP significantly outperforms baseline methods in blocking adversarial attacks for text classification. In addition, in-depth analysis shows the robustness of DISP across different situations.
@inproceedings{zhou2019learning,
author = {Zhou, Yichao and Jiang, Jyun-Yu and Chang, Kai-Wei and Wang, Wei},
title = {Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification},
booktitle = {EMNLP},
year = {2019}
}

Details
• #### Examining Gender Bias in Languages with Grammatical Gender

Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-Hao Huang, Muhao Chen, Ryan Cotterell, and Kai-Wei Chang, in EMNLP, 2019.
Full Text Poster Code Abstract BibTeX Details
Recent studies have shown that word embeddings exhibit gender bias inherited from the training corpora. However, most studies to date have focused on quantifying and mitigating such bias only in English. These analyses cannot be directly extended to languages that exhibit morphological agreement on gender, such as Spanish and French. In this paper, we propose new metrics for evaluating gender bias in word embeddings of these languages and further demonstrate evidence of gender bias in bilingual embeddings which align these languages with English. Finally, we extend an existing approach to mitigate gender bias in word embeddings under both monolingual and bilingual settings. Experiments on modified Word Embedding Association Test, word similarity, word translation, and word pair translation tasks show that the proposed approaches effectively reduce the gender bias while preserving the utility of the embeddings.
@inproceedings{zhou2019examining,
author = {Zhou, Pei and Shi, Weijia and Zhao, Jieyu and Huang, Kuan-Hao and Chen, Muhao and Cotterell, Ryan and Chang, Kai-Wei},
title = {Examining Gender Bias in Languages with Grammatical Gender},
booktitle = {EMNLP},
year = {2019}
}

Details
• #### Robust Text Classifier on Test-Time Budgets

Md Rizwan Parvez, Tolga Bolukbasi, Kai-Wei Chang, and Venkatesh Saligrama, in EMNLP (short), 2019.
Full Text Slides Code Abstract BibTeX Details
We propose a generic and interpretable learning framework for building robust text classification model that achieves accuracy comparable to full models under test-time budget constraints. Our approach learns a selector to identify words that are relevant to the prediction tasks and passes them to the classifier for processing. The selector is trained jointly with the classifier and directly learns to incorporate with the classifier. We further propose a data aggregation scheme to improve the robustness of the classifier. Our learning framework is general and can be incorporated with any type of text classification model. On real-world data, we show that the proposed approach improves the performance of a given classifier and speeds up the model with a mere loss in accuracy performance.
@inproceedings{parvez2019robust,
author = {Parvez, Md Rizwan and Bolukbasi, Tolga and Chang, Kai-Wei and Saligrama, Venkatesh},
title = {Robust Text Classifier on Test-Time Budgets},
booktitle = {EMNLP (short)},
year = {2019}
}

Details
• #### The Woman Worked as a Babysitter: On Biases in Language Generation

Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP (short), 2019.
Full Text Slides Video Code Abstract BibTeX Details
We present a systematic study of biases in natural language generation (NLG) by analyzing text generated from prompts that contain mentions of different demographic groups. In this work, we introduce the notion of the regard towards a demographic, use the varying levels of regard towards different demographics as a defining metric for bias in NLG, and analyze the extent to which sentiment scores are a relevant proxy metric for regard. To this end, we collect strategically-generated text from language models and manually annotate the text with both sentiment and regard scores. Additionally, we build an automatic regard classifier through transfer learning, so that we can analyze biases in unseen text. Together, these methods reveal the extent of the biased nature of language model generations. Our analysis provides a study of biases in NLG, bias metrics and correlated human judgments, and empirical evidence on the usefulness of our annotated dataset.
@inproceedings{sheng2019woman,
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
title = {The Woman Worked as a Babysitter: On Biases in Language Generation},
booktitle = {EMNLP (short)},
vimeo_id = {426366363},
year = {2019}
}

Details
• #### Retrofitting Contextualized Word Embeddings with Paraphrases

Weijia Shi, Muhao Chen, Pei Zhou, and Kai-Wei Chang, in EMNLP (short), 2019.
Full Text Slides Video Code Abstract BibTeX Details
Contextualized word embedding models, such as ELMo, generate meaningful representations of words and their context. These models have been shown to have a great impact on downstream applications. However, in many cases, the contextualized embedding of a word changes drastically when the context is paraphrased. As a result, the downstream model is not robust to paraphrasing and other linguistic variations. To enhance the stability of contextualized word embedding models, we propose an approach to retrofitting contextualized embedding models with paraphrase contexts. Our method learns an orthogonal transformation on the input space, which seeks to minimize the variance of word representations on paraphrased contexts. Experiments show that the retrofitted model significantly outperforms the original ELMo on various sentence classification and language inference tasks.
@inproceedings{shi2019retrofitting,
author = {Shi, Weijia and Chen, Muhao and Zhou, Pei and Chang, Kai-Wei},
title = {Retrofitting Contextualized Word Embeddings with Paraphrases},
booktitle = {EMNLP (short)},
vimeo_id = {430797636},
year = {2019}
}

Details
• #### Visualizing Trend of Key Roles in News Articles

Chen Xia, Haoxiang Zhang, Jacob Moghtader, Allen Wu, and Kai-Wei Chang, in EMNLP (demo), 2019.
Full Text Code Abstract BibTeX Details
There are tons of news articles generated every day reflecting the activities of key roles such as people, organizations and political parties. Analyzing these key roles allows us to understand the trends in news. In this paper, we present a demonstration system that visualizes the trend of key roles in news articles based on natural language processing techniques. Specifically, we apply a semantic role labeler and the dynamic word embedding technique to understand relationships between key roles in the news across different time periods and visualize the trends of key role and news topics change over time.
@inproceedings{xia2019visualizing,
author = {Xia, Chen and Zhang, Haoxiang and Moghtader, Jacob and Wu, Allen and Chang, Kai-Wei},
title = {Visualizing Trend of Key Roles in News Articles},
booktitle = {EMNLP (demo)},
year = {2019}
}

Details
• #### Efficient Contextual Representation Learning With Continuous Outputs

Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, and Kai-Wei Chang, in TACL, 2019.
Full Text Slides Video Abstract BibTeX Details
Contextual representation models have achieved great success in improving various downstream natural language processing tasks. However, these language-model-based encoders are difficult to train due to their large parameter size and high computational complexity. By carefully examining the training procedure, we observe that the softmax layer, which predicts a distribution of the target word, often induces significant overhead, especially when the vocabulary size is large. Therefore, we revisit the design of the output layer and consider directly predicting the pre-trained embedding of the target word for a given context. When applied to ELMo, the proposed approach achieves a 4 times speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks. Further analysis shows that the approach maintains the speed advantage under various settings, even when the sentence encoder is scaled up.
@inproceedings{li2019efficient,
author = {Li, Liunian Harold and Chen, Patrick H. and Hsieh, Cho-Jui and Chang, Kai-Wei},
title = {Efficient Contextual Representation Learning With Continuous Outputs},
booktitle = {TACL},
year = {2019}
}

Details
• #### Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations

Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, and Vicente Ordonez, in ICCV, 2019.
Full Text Code Demo Abstract BibTeX Details
In this work, we present a framework to measure and mitigate intrinsic biases with respect to protected variables –such as gender– in visual recognition tasks. We show that trained models significantly amplify the association of target labels with gender beyond what one would expect from biased datasets. Surprisingly, we show that even when datasets are balanced such that each label co-occurs equally with each gender, learned models amplify the association between labels and gender, as much as if data had not been balanced! To mitigate this, we adopt an adversarial approach to remove unwanted features corresponding to protected variables from intermediate representations in a deep neural network – and provide a detailed analysis of its effectiveness. Experiments on two datasets: the COCO dataset (objects), and the imSitu dataset (actions), show reductions in gender bias amplification while maintaining most of the accuracy of the original models.
@inproceedings{wang2019balanced,
author = {Wang, Tianlu and Zhao, Jieyu and Yatskar, Mark and Chang, Kai-Wei and Ordonez, Vicente},
title = {Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations},
booktitle = {ICCV},
year = {2019}
}

Details
• #### Few-Shot Representation Learning for Out-Of-Vocabulary Words

Ziniu Hu, Ting Chen, Kai-Wei Chang, and Yizhou Sun, in ACL, 2019.
Full Text Poster Code Abstract BibTeX Details
Existing approaches for learning word embeddings often assume there are sufficient occurrences for each word in the corpus, such that the representation of words can be accurately estimated from their contexts. However, in real-world scenarios, out-of-vocabulary (a.k.a. OOV) words that do not appear in training corpus emerge frequently. It is challenging to learn accurate representations of these words with only a few observations. In this paper, we formulate the learning of OOV embeddings as a few-shot regression problem, and address it by training a representation function to predict the oracle embedding vector (defined as embedding trained with abundant observations) based on limited observations. Specifically, we propose a novel hierarchical attention-based architecture to serve as the neural regression function, with which the context information of a word is encoded and aggregated from K observations. Furthermore, our approach can leverage Model-Agnostic Meta-Learning (MAML) for adapting the learned model to the new corpus fast and robustly. Experiments show that the proposed approach significantly outperforms existing methods in constructing accurate embeddings for OOV words, and improves downstream tasks where these embeddings are utilized.
@inproceedings{hu2019fewshot,
author = {Hu, Ziniu and Chen, Ting and Chang, Kai-Wei and Sun, Yizhou},
title = {Few-Shot Representation Learning for Out-Of-Vocabulary Words},
booktitle = {ACL},
year = {2019}
}

Details
• #### Mitigating Gender in Natural Language Processing: Literature Review

Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Kai-Wei Chang, and William Yang Wang, in ACL, 2019.
Full Text Slides Video Abstract BibTeX Details
As Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity, it becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. Although NLP models have shown success in modeling various applications, they propagate and may even amplify gender bias found in text corpora. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent. In this paper, we review contemporary studies on recognizing and mitigating gender bias in NLP. We discuss gender bias based on four forms of representation bias and analyze methods recognizing gender bias. Furthermore, we discuss the advantages and drawbacks of existing gender debiasing methods. Finally, we discuss future studies for recognizing and mitigating gender bias in NLP.
@inproceedings{sun2019mitigating,
author = {Sun, Tony and Gaut, Andrew and Tang, Shirlyn and Huang, Yuxin and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Chang, Kai-Wei and Wang, William Yang},
title = {Mitigating Gender in Natural Language Processing: Literature Review},
booktitle = {ACL},
vimeo_id = {384482151},
year = {2019}
}

Details
• #### On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, and Nanyun Peng, in NAACL, 2019.
Full Text Video Code Abstract BibTeX Details
Different languages might have different wordorders. In this paper, we investigate cross-lingual transfer and posit that an order-agnostic model will perform better when trans-ferring to distant foreign languages. To test ourhypothesis, we train dependency parsers on anEnglish corpus and evaluate their transfer per-formance on 30 other languages. Specifically,we compare encoders and decoders based onRecurrent Neural Networks (RNNs) and mod-ified self-attentive architectures. The formerrelies on sequential information while the lat-ter is more flexible at modeling word order.Rigorous experiments and detailed analysisshows that RNN-based architectures transferwell to languages that are close to English,while self-attentive models have better overallcross-lingual transferability and perform espe-cially well on distant languages.
@inproceedings{ahmad2019difficulties,
author = {Ahmad, Wasi Uddin and Zhang, Zhisong and Ma, Xuezhe and Hovy, Eduard and Chang, Kai-Wei and Peng, Nanyun},
title = {On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing},
booktitle = {NAACL},
year = {2019}
}

Details
• #### Gender Bias in Contextualized Word Embeddings

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang, in NAACL (short), 2019.
Full Text Slides Video Abstract BibTeX Details
Despite the great success of contextualized word embeddings on downstream applications, these representations potentially embed the societal biases exhibited in their training corpus. In this paper, we quantify, analyze and mitigate the gender bias exhibited in ELMo contextualized word vectors. We first demonstrate that the vectors encode and propagate information about genders unequally and then conduct a principal component analysis to visualize the geometry of the gender information in the embeddings. Then we show that ELMo works unequally well for men and women in down-stream tasks. Finally, we explore a variety of methods to remove such gender bias and demonstrate that it can be reduced through data augmentation.
@inproceedings{zhao2019gender,
author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Cotterell, Ryan and Ordonez, Vicente and Chang, Kai-Wei},
title = {Gender Bias in Contextualized Word Embeddings},
booktitle = {NAACL (short)},
year = {2019}
}

Details
• #### Context Attentive Document Ranking and Query Suggestion

Wasi Ahmad, Kai-Wei Chang, and Hongning Wang, in SIGIR, 2019.
Full Text Slides Code Abstract BibTeX Details
We present a context-aware neural ranking model to exploit users’ on-task search activities and enhance retrieval performance. Inparticular, a two-level hierarchical recurrent neural network isintroduced to learn search context representation of individualqueries, search tasks, and corresponding dependency structure byjointly optimizing two companion retrieval tasks: document rank-ing and query suggestion. To identify variable dependency structurebetween search context and users’ ongoing search activities, at-tention at both levels of recurrent states are introduced. Extensiveexperiment comparisons against a rich set of baseline methods andan in-depth ablation analysis confirm the value of our proposedapproach for modeling search context buried in search tasks.
@inproceedings{ahmad2019context,
author = {Ahmad, Wasi and Chang, Kai-Wei and Wang, Hongning},
title = {Context Attentive Document Ranking and Query Suggestion},
booktitle = {SIGIR},
year = {2019}
}

Details
• #### Multifaceted Protein-Protein Interaction Prediction Based on Siamese Residual RCNN

Muhao Chen, Chelsea J.-T. Ju, Guangyu Zhou, Xuelu Chen, Tianran Zhang, Kai-Wei Chang, Carlo Zaniolo, and Wei Wang, in ISMB, 2019.
Full Text Code Abstract BibTeX Details
Sequence-based protein-protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information. Hence, we present an end-to-end framework, Lasagna, for PPI predictions using only the primary sequences of a protein pair. Lasagna incorporates a deep residual recurrent convolutional neural network in the Siamese learning architecture, which leverages both robust local features and contextualized information that are significant for capturing the mutual influence of protein sequences. Our framework relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that Lasagna outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.
@inproceedings{chen2019multifaceted,
author = {Chen, Muhao and Ju, Chelsea J.-T. and Zhou, Guangyu and Chen, Xuelu and Zhang, Tianran and Chang, Kai-Wei and Zaniolo, Carlo and Wang, Wei},
title = {Multifaceted Protein-Protein Interaction Prediction Based on Siamese Residual RCNN},
booktitle = {ISMB},
year = {2019}
}

Details
• #### Pre-Training Graph Neural Networks for Generic Structural Feature Extraction

Ziniu Hu, Changjun Fan, Ting Chen, Kai-Wei Chang, and Yizhou Sun, in ICLR 2019 Workshop: Representation Learning on Graphs and Manifolds, 2019.
Full Text Abstract BibTeX Details
Graph neural networks (GNNs) are shown to be successful in modeling applications with graph structures. However, training an accurate GNN model requires a large collection of labeled data and expressive features, which might be inaccessible for some applications. To tackle this problem, we propose a pre-training framework that captures generic graph structural information that is transferable across tasks. Our framework can leverage the following three tasks: 1) denoising link reconstruction, 2) centrality score ranking, and 3) cluster preserving. The pre-training procedure can be conducted purely on the synthetic graphs, and the pre-trained GNN is then adapted for downstream applications. With the proposed pre-training procedure, the generic structural information is learned and preserved, thus the pre-trained GNN requires less amount of labeled data and fewer domain-specific features to achieve high performance on different downstream tasks. Comprehensive experiments demonstrate that our proposed framework can significantly enhance the performance of various tasks at the level of node, link, and graph.
@inproceedings{hu2019pretraining,
author = {Hu, Ziniu and Fan, Changjun and Chen, Ting and Chang, Kai-Wei and Sun, Yizhou},
title = {Pre-Training Graph Neural Networks for Generic Structural Feature Extraction},
booktitle = {ICLR 2019 Workshop: Representation Learning on Graphs and Manifolds},
year = {2019}
}

Details
• #### Learning Bilingual Word Embeddings Using Lexical Definitions

Weijia Shi, Muhao Chen, Yingtao Tian, and Kai-Wei Chang, in Repl4NLP (ACL workshop), 2019.
Full Text Abstract BibTeX Details
Bilingual word embeddings, which represent lexicons of different languages in a shared embedding space, are essential for supporting semantic and knowledge transfers in a variety of cross-lingual NLP tasks. Existing approaches to training bilingual word embeddings require either large collections of pre-defined seed lexicons that are expensive to obtain, or parallel sentences that comprise coarse and noisy alignment. In contrast, we propose BiLex that leverages publicly available lexical definitions for bilingual word embedding learning. Without the need of predefined seed lexicons, BiLex comprises a novel word pairing strategy to automatically identify and propagate the precise fine-grain word alignment from lexical definitions. We evaluate BiLex in word-level and sentence-level translation tasks, which seek to find the cross-lingual counterparts of words and sentences respectively. BiLex significantly outperforms previous embedding methods on both tasks.
@inproceedings{shi2019bilingual,
author = {Shi, Weijia and Chen, Muhao and Tian, Yingtao and Chang, Kai-Wei},
title = {Learning Bilingual Word Embeddings Using Lexical Definitions},
booktitle = {Repl4NLP (ACL workshop)},
poster = {http://kwchang.net/documents/slides/shi2019bilingual_poster.pdf},
year = {2019}
}

Details

## 2018

• #### Generating Natural Language Adversarial Examples

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang, in EMNLP (short), 2018.
Full Text Code Abstract BibTeX Details
Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the network to misclassify. In the image domain, these perturbations can often be made virtually indistinguishable to human perception, causing humans and state-of-the-art models to disagree. However, in the natural language domain, small perturbations are clearly perceptible, and the replacement of a single word can drastically alter the semantics of the document. Given these challenges, we use a population-based optimization algorithm to generate semantically and syntactically similar adversarial examples. We demonstrate via a human study that 94.3% of the generated examples are classified to the original label by human evaluators, and that the examples are perceptibly quite similar. We hope our findings encourage researchers to pursue improving the robustness of DNNs in the natural language domain.
@inproceedings{alzanto2018generating,
author = {Alzantot, Moustafa and Sharma, Yash and Elgohary, Ahmed and Ho, Bo-Jhang and Srivastava, Mani and Chang, Kai-Wei},
title = {Generating Natural Language Adversarial Examples},
booktitle = {EMNLP (short)},
year = {2018}
}

Details
• #### Learning Gender-Neutral Word Embeddings

Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang, in EMNLP (short), 2018.
Full Text Code Abstract BibTeX Details
Word embeddings have become a fundamental component in a wide range of Natu-ral Language Processing (NLP) applications.However, these word embeddings trained onhuman-generated corpora inherit strong gen-der stereotypes that reflect social constructs.In this paper, we propose a novel word em-bedding model, De-GloVe, that preserves gen-der information in certain dimensions of wordvectors while compelling other dimensions tobe free of gender influence. Quantitative andqualitative experiments demonstrate that De-GloVe successfully isolates gender informa-tion without sacrificing the functionality of theembedding model.
@inproceedings{zhao2018learning,
author = {Zhao, Jieyu and Zhou, Yichao and Li, Zeyu and Wang, Wei and Chang, Kai-Wei},
title = {Learning Gender-Neutral Word Embeddings},
booktitle = {EMNLP (short)},
year = {2018}
}

Details
• #### Building Language Models for Text with Named Entities

Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in ACL, 2018.
Full Text Poster Code Abstract BibTeX Details
Text in many domains involves a significant amount of named entities. Predicting the entity names is often challenging for a language model as they appear less frequent on the training corpus. In this paper, we propose a novel and effective approach to building a language model which can learn the entity names by leveraging their entity type information. We also introduce two benchmark datasets based on recipes and Java programming codes, on which we evaluate the proposed model. Experimental results show that our model achieves 52.2% better perplexity in recipe generation and 40.3% on code generation than state-of-the-art language models.
@inproceedings{parvez2018building,
author = {Parvez, Md Rizwan and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
title = {Building Language Models for Text with Named Entities},
booktitle = {ACL},
year = {2018}
}

Details
• #### Learning Word Embeddings for Low-resource Languages by PU Learning

Chao Jiang, Hsiang-Fu Yu, Cho-Jui Hsieh, and Kai-Wei Chang, in NAACL, 2018.
Full Text Slides Video Code Abstract BibTeX Details
Word embedding has been used as a key component in many downstream applications in processing natural languages. Existing approaches often assume the existence of a large collection of text for learning effective word embedding. However, such a corpus may not be available for some low-resource languages. In this paper, we study how to effectively learn a word embedding model on a corpus with only a few million tokens. In such a situation, the co-occurrence matrix is very sparse because many word pairs are not observed to co-occur. In contrast to existing approaches, we argue that the zero entries in the co-occurrence matrix also provide valuable information and design a Positive-Unlabeled Learning (PU-Learning) approach to factorize the co-occurrence matrix. The experimental results demonstrate that the proposed approach requires a smaller amount of training text to obtain a reasonable word embedding model.
@inproceedings{jiang2018learning,
author = {Jiang, Chao and Yu, Hsiang-Fu and Hsieh, Cho-Jui and Chang, Kai-Wei},
title = {Learning Word Embeddings for Low-resource Languages by PU Learning},
booktitle = {NAACL},
vimeo_id = {277670013},
year = {2018}
}

Details
• #### Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang, in NAACL (short), 2018.
Full Text Poster Code Abstract BibTeX Details
In this paper, we introduce a new benchmark for co-reference resolution focused on gender bias, WinoBias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing datasets.
@inproceedings{zhao2018gender,
author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei},
title = {Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods},
booktitle = {NAACL (short)},
press_url = {https://www.stitcher.com/podcast/matt-gardner/nlp-highlights/e/55861936},
year = {2018}
}

Details
• #### Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment

Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo, in IJCAI, 2018.
Full Text Slides Code Abstract BibTeX Details
Multilingual knowledge graph (KG) embeddings provide latent semantic representations of entities and structured knowledge enabled with cross-lingual inferences that benefit various knowledge-driven cross-lingual NLP tasks. However, precisely learning such cross-lingual inferences is usually hindered by the low coverage of entity alignment in many KGs. Since many multilingual KGs also provide literal descriptions of entities, in this paper, we introduce an embedding-based approach which leverages a weakly aligned multilingual KG for semi-supervised cross-lingual learning using entity descriptions. Our approach performs co-training of two embedding models, i.e. a multilingual KG embedding model and a multilingual literal description embedding model. The models are trained on a large Wikipedia-based trilingual dataset where most entity alignment is unknown to training. Experimental results show that the performance of the proposed approach on the entity alignment task improves at each iteration of co-training, and eventually reaches a stage at which it significantly surpasses previous approaches. We also show that our approach has promising abilities for zero-shot entity alignment, and cross-lingual KG completion.
@inproceedings{chen2018multilingual,
author = {Chen, Muhao and Tian, Yingtao and Chang, Kai-Wei and Skiena, Steven and Zaniolo, Carlo},
title = {Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment},
booktitle = {IJCAI},
year = {2018}
}

Details
• #### Multi-Task Learning for Document Ranking and Query Suggestion

Wasi Ahmad, Kai-Wei Chang, and Hongning Wang, in ICLR, 2018.
Full Text Code Abstract BibTeX Details
We propose a multi-task learning framework to jointly learn document ranking and query suggestion for web search. It consists of two major components, a document ranker and a query recommender. Document ranker combines current query and session information and compares the combined representation with document representation to rank the documents. Query recommen tracks users’ query reformulation sequence considering all previous in-session queries using a sequence to sequence approach. As both tasks are driven by the users’ underlying search intent, we perform joint learning of these two components through session recurrence, which encodes search context and intent. Extensive comparisons against state-of-the-art document ranking and query suggestion algorithms are performed on the public AOL search log, and the promising results endorse the effectiveness of the joint learning framework.
@inproceedings{ahmad2018multitask,
author = {Ahmad, Wasi and Chang, Kai-Wei and Wang, Hongning},
title = {Multi-Task Learning for Document Ranking and Query Suggestion},
booktitle = {ICLR},
year = {2018}
}

Details
• #### Intent-aware Query Obfuscation for Privacy Protection in Personalized Web Search

Wasi Ahmad, Kai-Wei Chang, and Hongning Wang, in SIGIR, 2018.
Full Text Code Abstract BibTeX Details
Modern web search engines exploit users’ search history to personalize search results, with a goal of improving their service utility on a per-user basis. But it is this very dimension that leads to the risk of privacy infringement and raises serious public concerns. In this work, we propose a client-centered intent-aware query obfuscation solution for protecting user privacy in a personalized web search scenario. In our solution, each user query is submitted with l additional cover queries and corresponding clicks, which act as decoys to mask users’ genuine search intent from a search engine. The cover queries are sequentially sampled from a set of hierarchically organized language models to ensure the coherency of fake search intents in a cover search task. Our approach emphasizes the plausibility of generated cover queries, not only to the current genuine query but also to previous queries in the same task, to increase the complexity for a search engine to identify a user’s true intent. We also develop two new metrics from an information theoretic perspective to evaluate the effectiveness of provided privacy protection. Comprehensive experiment comparisons with state-of-the-art query obfuscation techniques are performed on the public AOL search log, and the propitious results substantiate the effectiveness of our solution.
@inproceedings{ahmad2018intent,
author = {Ahmad, Wasi and Chang, Kai-Wei and Wang, Hongning},
title = {Intent-aware Query Obfuscation for Privacy Protection in Personalized Web Search},
booktitle = {SIGIR},
year = {2018}
}

Details
• #### Counterexamples for Robotic Planning Explained in Structured Language

Lu Feng, Mahsa Ghasemi, Kai-Wei Chang, and Ufuk Topcu, in ICRA, 2018.
Full Text Abstract BibTeX Details
Automated techniques such as model checking have been used to verify models of robotic mission plans based on Markov decision processes (MDPs) and generate counterexamples that may help diagnose requirement violations. However, such artifacts may be too complex for humans to understand, because existing representations of counterexamples typically include a large number of paths or a complex automaton. To help improve the interpretability of counterexamples, we define a notion of explainable counterexample, which includes a set of structured natural language sentences to describe the robotic behavior that lead to a requirement violation in an MDP model of robotic mission plan. We propose an approach based on mixed-integer linear programming for generating explainable counterexamples that are minimal, sound and complete. We demonstrate the usefulness of the proposed approach via a case study of warehouse robots planning.
@inproceedings{feng2018conterexamples,
author = {Feng, Lu and Ghasemi, Mahsa and Chang, Kai-Wei and Topcu, Ufuk},
title = {Counterexamples for Robotic Planning Explained in Structured Language},
booktitle = {ICRA},
year = {2018}
}

Details
• #### A Corpus to Learn Refer-to-as Relations for Nominals

Wasi Ahmad and Kai-Wei Chang, in LREC, 2018.
Full Text Code Abstract BibTeX Details
Continuous representations for words or phrases, trained on large unlabeled corpora are proved very useful for many natural language processing tasks. While these vector representations capture many fine-grained syntactic and semantic regularities among words or phrases, it often lacks coreferential information which is useful for many downstream tasks like information extraction, text summarization etc. In this paper, we argue that good word and phrase embeddings should contain information for identifying refer-to-as relationship and construct a corpus from Wikipedia to generate coreferential neural embeddings for nominals. The term \emphnominal refers to a word or a group of words that functions like a noun phrase. In addition, we use coreference resolution as a proxy to evaluate the learned neural embeddings for noun phrases. To simplify the evaluation procedure, we design a coreferential phrase prediction task where the learned nominal embeddings are used to predict which candidate nominals can be referred to a target nominal. We further describe how to construct an evaluation dataset for such task from well known OntoNotes corpus and demonstrate encouraging baseline results.
@inproceedings{AC18,
author = {Ahmad, Wasi and Chang, Kai-Wei},
title = {A Corpus to Learn Refer-to-as Relations for Nominals},
booktitle = {LREC},
year = {2018}
}

Details
• #### Word and sentence embedding tools to measure semantic similarity of Gene Ontology terms by their definitions

Dat Duong, Wasi Uddin Ahmad, Eleazar Eskin, Kai-Wei Chang, and Jingyi Jessica Li, in Journal of Computational Biology, 2018.
Full Text Code Abstract BibTeX Details
The Gene Ontology (GO) database contains GO terms that describe biological functions of genes.
Previous methods for comparing GO terms have relied on the fact that GO terms are organized
into a tree structure. Under this paradigm, the locations of two GO terms in the tree dictate their
similarity score. In this paper, we introduce two new solutions for this problem, by focusing
instead on the definitions of the GO terms. We apply neural network based techniques from
the natural language processing (NLP) domain. The first method does not rely on the GO tree,
whereas the second indirectly depends on the GO tree. In our first approach, we compare two GO
definitions by treating them as two unordered sets of words. The word similarity is estimated by a
word embedding model that maps words into an N-dimensional space. In our second approach,
we account for the word-ordering within a sentence. We use a sentence encoder to embed GO
definitions into vectors and estimate how likely one definition entails another. We validate our
methods in two ways. In the first experiment, we test the model’s ability to differentiate a true
protein-protein network from a randomly generated network. In the second experiment, we test
the model in identifying orthologs from randomly-matched genes in human, mouse, and fly. In
both experiments, a hybrid of NLP and GO-tree based method achieves the best classification
accuracy.
@inproceedings{DAECL18,
author = {Duong, Dat and Ahmad, Wasi Uddin and Eskin, Eleazar and Chang, Kai-Wei and Li, Jingyi Jessica},
title = {Word and sentence embedding tools to measure semantic similarity of Gene Ontology terms by their definitions},
booktitle = {Journal of Computational Biology},
year = {2018}
}

Details
• #### A Corpus of Drug Usage Guidelines Annotated with Type of Advice

Sarah Masud Preum, Md. Rizwan Parvez, Kai-Wei Chang, and John Stankovic, in LREC, 2018.
Full Text Code Abstract BibTeX Details
Adherence to drug usage guidelines for prescription and over-the-counter drugs is critical for drug safety and effectiveness of treatment. Drug usage guideline documents contain advice on potential drug-drug interaction, drug-food interaction, and drug administration process. Current research on drug safety and public health indicates patients are often either unaware of such critical advice or overlook them. Categorizing advice statements from these documents according to their topics can enable the patients to find safety critical information. However, automatically categorizing drug usage guidelines based on their topic is an open challenge and there is no annotated dataset on drug usage guidelines. To address the latter issue, this paper presents (i) an annotation scheme for annotating safety critical advice from drug usage guidelines, (ii) an annotation tool for such data, and (iii) an annotated dataset containing drug usage guidelines from 90 drugs. This work is expected to accelerate further release of annotated drug usage guideline datasets and research on automatically filtering safety critical information from these textual documents.
@inproceedings{PPCS18,
author = {Preum, Sarah Masud and Parvez, Md. Rizwan and Chang, Kai-Wei and Stankovic, John},
title = {A Corpus of Drug Usage Guidelines Annotated with Type of Advice},
booktitle = {LREC},
year = {2018}
}

Details
• #### Quantification and Analysis of Scientific Language Variation Across Research Fields

Pei Zhou, Muhao Chen, Kai-Wei Chang, and Carlo Zaniolo, in CDEC (workshop at ICDM), 2018.
Full Text Abstract BibTeX Details
Quantifying differences in terminologies from various academic domains has been a longstanding problem yet to be
solved. We propose a computational approach for analyzing linguistic variation among scientific research fields by capturing the
semantic change of terms based on a neural language model. The
model is trained on a large collection of literature in five computer
science research fields, for which we obtain field-specific vector
representations for key terms, and global vector representations
for other words. Several quantitative approaches are introduced
to identify the terms whose semantics have drastically changed,
or remain unchanged across different research fields. We also
propose a metric to quantify the overall linguistic variation of
research fields. After quantitative evaluation on human annotated
data and qualitative comparison with other methods, we show
that our model can improve cross-disciplinary data collaboration
by identifying terms that potentially induce confusion during
interdisciplinary studies.
@inproceedings{ZCCZ18,
author = {Zhou, Pei and Chen, Muhao and Chang, Kai-Wei and Zaniolo, Carlo},
title = {Quantification and Analysis of Scientific Language Variation Across Research Fields},
booktitle = {CDEC (workshop at ICDM)},
year = {2018}
}

Details

## 2017

• #### Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang, in EMNLP, 2017.
Full Text Slides Code Abstract BibTeX Details EMNLP 2017 Best Long Paper Award
Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occuring labels and visual input but risk inadvertently encoding social biases found in web corpora.
In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, but a trained model amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for the resulting inference problems. Our method results in no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 33.3% and 44.9% for multilabel classification and visual semantic role labeling, respectively.
@inproceedings{zhao2017men,
author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei},
title = {Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints},
booktitle = {EMNLP},
year = {2017}
}

Details
• #### Counterfactual Language Model Adaptation for Suggesting Phrases

Kenneth Arnold, Kai-Wei Chang, and Adam T. Kalai, in IJCNLP (short), 2017.
Full Text Abstract BibTeX Details
We study the challenge of suggesting multi-word phrases to be inserted while typing on a mobile keyboard. Recent work in mobile text entry user-interfaces has shown that, unlike single-word predictions, these phrases are treated as suggestions rather than predictions, meaning that users often insert words that weren’t what they were planning on typing.
This suggests the NLP problem of offering multi-word suggestions that are likely to be accepted by a user. We propose a method for customizing an existing language model to adapt it to a specific such task, and show how to learn the parameters of that customization offline.
@inproceedings{ACK17,
author = {Arnold, Kenneth and Chang, Kai-Wei and Kalai, Adam T.},
title = {Counterfactual Language Model Adaptation for Suggesting Phrases},
booktitle = {IJCNLP (short)},
year = {2017}
}

Details
• #### Structured Prediction with Test-time Budget Constraints

Tolga Bolukbasi, Kai-Wei Chang, Joseph Wang, and Venkatesh Saligrama, in AAAI, 2017.
Full Text Slides Abstract BibTeX Details
We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two real-world structured prediction tasks, optical character recognition (OCR) and dependency parsing. For OCR our method cuts the feature acquisition time by half coming within a 1% margin of top accuracy. For dependency parsing we realize an overall runtime gain of 20% without significant loss in performance.
@inproceedings{bolukbasi2017structured,
author = {Bolukbasi, Tolga and Chang, Kai-Wei and Wang, Joseph and Saligrama, Venkatesh},
title = {Structured Prediction with Test-time Budget Constraints},
booktitle = {AAAI},
year = {2017}
}

Details
• #### Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

Shyam Upadhyay, Kai-Wei Chang, Matt Taddy, Adam Kalai, and James Zou, in ACL RepL4NLP Workshop, 2017.
Full Text Abstract BibTeX Details Best Paper Award
Word embeddings, which represent a word as a point in a vector space, have become ubiquitous to several NLP tasks. A recent line of work uses bilingual (two languages) corpora to learn a different vector for each sense of a word, by exploiting crosslingual signals to aid sense identification. We present a multi-view Bayesian non-parametric algorithm which improves multi-sense word embeddings by (a) using multilingual (i.e., more than two languages) corpora to significantly improve sense embeddings beyond what one achieves with bilingual information, and (b) uses a principled approach to learn a variable number of senses per word, in a data-driven manner. Ours is the first approach with the ability to leverage multilingual corpora efficiently for multi-sense representation learning. Experiments show that multilingual training significantly improves performance over monolingual and bilingual training, by allowing us to combine different parallel corpora to leverage multilingual context. Multilingual training yields comparable performance to a state of the art monolingual model trained on five times more training data.
@inproceedings{upadhyay2017beyond,
author = {Upadhyay, Shyam and Chang, Kai-Wei and Taddy, Matt and Kalai, Adam and Zou, James},
title = {Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context},
booktitle = {ACL RepL4NLP Workshop},
year = {2017}
}

Details

## 2016

• #### Learning from Explicit and Implicit Supervision Jointly For Algebra Word Problems

Shyam Upadhyay, Ming-Wei Chang, Kai-Wei Chang, and Wen-tau Yih, in EMNLP, 2016.
Full Text Abstract BibTeX Details
Automatically solving algebra word problems has raised considerable interest recently. Existing state-of-the-art approaches mainly rely on learning from human annotated equations. In this paper, we demonstrate that it is possible to efficiently mine algebra problems and their numerical solutions with little to no manual effort. To leverage the mined dataset, we propose a novel structured-output learning algorithm that aims to learn from both explicit (e.g., equations) and implicit (e.g., solutions) supervision signals jointly. Enabled by this new algorithm, our model gains 4.6% absolute improvement in accuracy on the ALG-514 benchmark compared to the one without using implicit supervision. The final model also outperforms the current state-of-the-art approach by 3%.
Dataset
@inproceedings{BCWS16,
author = {Upadhyay, Shyam and Chang, Ming-Wei and Chang, Kai-Wei and Yih, Wen-tau},
title = {Learning from Explicit and Implicit Supervision Jointly For Algebra Word Problems},
booktitle = {EMNLP},
year = {2016}
}

Details
• #### Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai, in NeurIPS, 2016.
Full Text Code Abstract BibTeX Details reported by NPR and MIT Tech Review
The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.
@inproceedings{bolukbasi2016man,
author = {Bolukbasi, Tolga and Chang, Kai-Wei and Zou, James and Saligrama, Venkatesh and Kalai, Adam},
title = {Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings},
booktitle = {NeurIPS},
year = {2016}
}

Details
• #### EMNLP 16 Workshop on Structured Prediction for NLP

Kai-Wei Chang, Ming-Wei Chang, Vivek Srikumar, and Alexander M. Rush, in EMNLP, 2016.
Full Text Abstract BibTeX Details
Many prediction tasks in NLP involve assigning values to mutually dependent variables. For example, when designing a model to automatically perform linguistic analysis of a sentence or a document (e.g., parsing, semantic role labeling, or discourse analysis), it is crucial to model the correlations between labels. Many other NLP tasks, such as machine translation, textual entailment, and information extraction, can be also modeled as structured prediction problems.
In order to tackle such problems, various structured prediction approaches have been proposed, and their effectiveness has been demonstrated. Studying structured prediction is interesting from both NLP and machine learning (ML) perspectives. From the NLP perspective, syntax and semantics of natural language are clearly structured and advances in this area will enable researchers to understand the linguistic structure of data. From the ML perspective, the large amount of available text data and complex linguistic structures bring challenges to the learning community. Designing expressive yet tractable models and studying efficient learning and inference algorithms become important issues.
Recently, there has been significant interest in non-standard structured prediction approaches that take advantage of non-linearity, latent components, and/or approximate inference in both the NLP and ML communities. Researchers have also been discussing the intersection between deep learning and structured prediction through the DeepStructure reading group. This workshop intends to bring together NLP and ML researchers working on diverse aspects of structured prediction and expose the participants to recent progress in this area.
Workshop Site
@inproceedings{CCSR16,
author = {Chang, Kai-Wei and Chang, Ming-Wei and Srikumar, Vivek and Rush, Alexander M.},
title = {EMNLP 16 Workshop on Structured Prediction for NLP},
booktitle = {EMNLP},
year = {2016}
}

Details
• #### A Credit Assignment Compiler for Joint Prediction

Kai-Wei Chang, He He, Hal Daume III, John Langford, and Stephane Ross, in NeurIPS, 2016.
Full Text Code Abstract BibTeX Details
Many machine learning applications involve jointly predicting multiple mutually dependent output variables. Learning to search is a family of methods where the complex decision problem is cast into a sequence of decisions via a search space. Although these methods have shown promise both in theory and in practice, implementing them has been burdensomely awkward. In this paper, we show the search space can be defined by an arbitrary imperative program, turning learning to search into a credit assignment compiler. Altogether with the algorithmic improvements for the compiler, we radically reduce the complexity of programming and the running time. We demonstrate the feasibility of our approach on multiple joint prediction tasks. In all cases, we obtain accuracies as high as alternative approaches, at drastically reduced execution and programming time.
@inproceedings{chang2016credit,
author = {Chang, Kai-Wei and He, He and III, Hal Daume and Langford, John and Ross, Stephane},
title = {A Credit Assignment Compiler for Joint Prediction},
booktitle = {NeurIPS},
year = {2016}
}

Details

## 2015

• #### A Joint Framework for Coreference Resolution and Mention Head Detection

Haoruo Peng, Kai-Wei Chang, and Dan Roth, in CoNLL, 2015.
Full Text Abstract BibTeX Details
In coreference resolution, a fair amount of research treats mention detection as a preprocessed step and focuses on developing algorithms for clustering coreferred mentions. However, there are significant gaps between the performance on gold mentions and the performance on the real problem, when mentions are predicted from raw text via an imperfect Mention Detection (MD) module. Motivated by the goal of reducing such gaps, we develop an ILP-based joint coreference resolution and mention head formulation that is shown to yield significant improvements on coreference from raw text, outperforming existing state-of-art systems on both the ACE-2004 and the CoNLL-2012 datasets. At the same time, our joint approach is shown to improve mention detection by close to 15% F1. One key insight underlying our approach is that identifying and co-referring mention heads is not only sufficient but is more robust than working with complete mentions.
@inproceedings{peng2015joint,
author = {Peng, Haoruo and Chang, Kai-Wei and Roth, Dan},
title = {A Joint Framework for Coreference Resolution and Mention Head Detection},
booktitle = {CoNLL},
year = {2015}
}

Details
• #### Structural Learning with Amortized Inference

Kai-Wei Chang, Shyam Upadhyay, Gourab Kundu, and Dan Roth, in AAAI, 2015.
Full Text Poster Abstract BibTeX Details
Training a structured prediction model involves performing several loss-augmented inference steps. Over the lifetime of the training, many of these inference problems, although different, share the same solution. We propose AI-DCD, an Amortized Inference framework for Dual Coordinate Descent method, an approximate learning algorithm, that accelerates the training process by exploiting this redundancy of solutions, without compromising the performance of the model. We show the efficacy of our method by training a structured SVM using dual coordinate descent for an entity-relation extraction task. Our method learns the same model as an exact training algorithm would, but call the inference engine only in 10% . 24% of the inference problems encountered during training. We observe similar gains on a multi-label classification task and with a Structured Perceptron model for the entity-relation task.
@inproceedings{chang2015structural,
author = {Chang, Kai-Wei and Upadhyay, Shyam and Kundu, Gourab and Roth, Dan},
title = {Structural Learning with Amortized Inference},
booktitle = {AAAI},
year = {2015}
}

Details
• #### Selective Algorithms for Large-Scale Classification and Structured Learning

Kai-Wei Chang, in UIUC Phd Thesis, 2015.
Full Text Abstract BibTeX Details
The desired output in many machine learning tasks is a structured object, such as tree, clustering, or sequence. Learning accurate prediction models for such problems requires training on large amounts of data, making use of expressive features and performing global inference that simultaneously assigns values to all interrelated nodes in the structure. All these contribute to significant scalability problems. In this thesis, we describe a collection of results that address several aspects of these problems - by carefully selecting and caching samples, structures, or latent items.
Our results lead to entryfficient learning algorithms for large-scale binary classification models, structured prediction models and for online clustering models which, in turn, support reduction in problem size, improvements in training and evaluation speed and improved performance. We have used our algorithms to learn expressive models from large amounts of annotated data and achieve state-of-the art performance on several natural language processing tasks.
@inproceedings{chang2015thesis,
author = {Chang, Kai-Wei},
title = {Selective Algorithms for Large-Scale Classification and Structured Learning},
booktitle = {UIUC Phd Thesis},
year = {2015}
}

Details
• #### Learning to Search Better Than Your Teacher

Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daume; III, and John Langford, in ICML, 2015.
Full Text Video Code Abstract BibTeX Details
Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor?
We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.
@inproceedings{chang2015learninh,
author = {Chang, Kai-Wei and Krishnamurthy, Akshay and Agarwal, Alekh and III, Hal Daume; and Langford, John},
title = {Learning to Search Better Than Your Teacher},
booktitle = {ICML},
year = {2015}
}

Details
• #### Learning to Search for Dependencies

Kai-Wei Chang, He He, Hal Daume; III, and John Lanford, in Arxiv, 2015.
Full Text Code Abstract BibTeX Details
We demonstrate that a dependency parser can be built using a credit assignment compiler which removes the burden of worrying about low-level machine learning details from the parser implementation. The result is a simple parser which robustly applies to many languages that provides similar statistical and computational performance with best-to-date transition-based parsing approaches, while avoiding various downsides including randomization, extra feature requirements, and custom learning algorithms.
@inproceedings{chang2015learning,
author = {Chang, Kai-Wei and He, He and III, Hal Daume; and Lanford, John},
title = {Learning to Search for Dependencies},
booktitle = {Arxiv},
year = {2015}
}

Details
• #### IllinoisSL: A JAVA Library for Structured Prediction

Kai-Wei Chang, Shyam Upadhyay, Ming-Wei Chang, Vivek Srikumar, and Dan Roth, in Arxiv, 2015.
Full Text Abstract BibTeX Details
Training a structured prediction model involves performing several loss-augmented inference steps. Over the lifetime of the training, many of these inference problems, although different, share the same solution. We propose AI-DCD, an Amortized Inference framework for Dual Coordinate Descent method, an approximate learning algorithm, that accelerates the training process by exploiting this redundancy of solutions, without compromising the performance of the model. We show the efficacy of our method by training a structured SVM using dual coordinate descent for an entity-relation extraction task. Our method learns the same model as an exact training algorithm would, but call the inference engine only in 10% . 24% of the inference problems encountered during training. We observe similar gains on a multi-label classification task and with a Structured Perceptron model for the entity-relation task.
@inproceedings{chang2015illinoissl,
author = {Chang, Kai-Wei and Upadhyay, Shyam and Chang, Ming-Wei and Srikumar, Vivek and Roth, Dan},
title = {IllinoisSL: A JAVA Library for Structured Prediction},
booktitle = {Arxiv},
year = {2015}
}

Details
• #### Distributed Training of Structured SVM

Ching-pei Lee, Kai-Wei Chang, Shyam Upadhyay, and Dan Roth, in OPT workshop at NeurIPS, 2015.
Full Text Abstract BibTeX Details
Training structured prediction models is time-consuming. However, most existing approaches only use a single machine, thus, the advantage of computing power and the capacity for larger data sets of multiple machines have not been exploited. In this work, we propose an efficient algorithm for distributedly training structured support vector machines based on a distributed block-coordinate descent method. Both theoretical and experimental results indicate that our method is efficient.
@inproceedings{lee2015distributed,
author = {Lee, Ching-pei and Chang, Kai-Wei and Upadhyay, Shyam and Roth, Dan},
title = {Distributed Training of Structured SVM},
booktitle = {OPT workshop at NeurIPS},
year = {2015}
}

Details

## 2014

• #### A Discriminative Latent Variable Model for Online Clustering

Rajhans Samdani, Kai-Wei Chang, and Dan Roth, in ICML, 2014.
Full Text Slides Demo Abstract BibTeX Details
This paper presents a latent variable structured prediction model for discriminative supervised clustering of items called the Latent Left-linking Model (L3M). We present an online clustering algorithm for L3M based on a feature-based item similarity function. We provide a learning framework for estimating the similarity function and present a fast stochastic gradient-based learning technique. In our experiments on coreference resolution and document clustering, L3M outperforms several existing online as well as batch supervised clustering techniques.
@inproceedings{samdani2014discriminative,
author = {Samdani, Rajhans and Chang, Kai-Wei and Roth, Dan},
title = {A Discriminative Latent Variable Model for Online Clustering},
booktitle = {ICML},
year = {2014}
}

Details
• #### Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

Kai-Wei Chang, Wen-tau Yih, Bishan Yang, and Chris Meek, in EMNLP, 2014.
Full Text Video Abstract BibTeX Details
While relation extraction has traditionally been viewed as a task relying solely on textual data, recent work has shown that by taking as input existing facts in the form of entity-relation triples from both knowledge bases and textual data, the performance of relation extraction can be improved significantly. Following this new paradigm, we propose a tensor decomposition approach for knowledge base embedding that is highly scalable, and is especially suitable for relation extraction. By leveraging relational domain knowledge about entity type information, our learning algorithm is significantly faster than previous approaches and is better able to discover new relations missing from the database. In addition, when applied to a relation extraction task, our approach alone is comparable to several existing systems, and improves the weighted mean average precision of a state-of-the-art method by 10 points when used as a subcomponent.
@inproceedings{chang2014typed,
author = {Chang, Kai-Wei and Yih, Wen-tau and Yang, Bishan and Meek, Chris},
title = {Typed Tensor Decomposition of Knowledge Bases for Relation Extraction},
booktitle = {EMNLP},
year = {2014}
}

Details
• #### The Illinois-Columbia System in the CoNLL-2014 Shared Task

Alla Rozovskaya, Kai-Wei Chang, Mark Sammons, Dan Roth, and Nizar Habash, in CoNLL Shared Task, 2014.
Full Text Abstract BibTeX Details
The CoNLL-2014 shared task is an extension of last year’s shared task and focuses on correcting grammatical errors in essays written by non-native learners of English. In this paper, we describe the Illinois-Columbia system that participated in the shared task. Our system ranked second on the original annotations and first on the revised annotations.
The core of the system is based on the University of Illinois model that placed first in the CoNLL-2013 shared task. This baseline model has been improved and expanded for this year’s competition in several respects. We describe our underlying approach, which relates to our previous work, and describe the novel aspects of the system in more detail.
@inproceedings{RCSRH14,
author = {Rozovskaya, Alla and Chang, Kai-Wei and Sammons, Mark and Roth, Dan and Habash, Nizar},
title = {The Illinois-Columbia System in the CoNLL-2014 Shared Task},
booktitle = {CoNLL Shared Task},
year = {2014}
}

Details

## 2013

• #### Multi-core Structural SVM Training

Kai-Wei Chang, Vivek Srikumar, and Dan Roth, in ECML, 2013.
Full Text Poster Abstract BibTeX Details
Many problems in natural language processing and computer vision can be framed as structured prediction problems. Structural support vector machines (SVM) is a popular approach for training structured predictors, where learning is framed as an optimization problem. Most structural SVM solvers alternate between a model update phase and an inference phase (which predicts structures for all training examples). As structures become more complex, inference becomes a bottleneck and thus slows down learning considerably. In this paper, we propose a new learning algorithm for structural SVMs called DEMI-DCD that extends the dual coordinate descent approach by decoupling the model update and inference phases into different threads. We take advantage of multi-core hardware to parallelize learning with minimal synchronization between the model update and the inference phases. We prove that our algorithm not only converges but also fully utilizes all available processors to speed up learning, and validate our approach on two real-world NLP problems: part-of-speech tagging and relation extraction. In both cases, we show that our algorithm utilizes all available processors to speed up learning and achieves competitive performance. For example, it achieves a relative duality gap of 1% on a POS tagging problem in 192 seconds using 16 threads, while a standard implementation of a multi-threaded dual coordinate descent algorithm with the same number of threads requires more than 600 seconds to reach a solution of the same quality.
@inproceedings{chang2013multicore,
author = {Chang, Kai-Wei and Srikumar, Vivek and Roth, Dan},
title = {Multi-core Structural SVM Training},
booktitle = {ECML},
year = {2013}
}

Details
• #### A Constrained Latent Variable Model for Coreference Resolution

Kai-Wei Chang, Rajhans Samdani, and Dan Roth, in EMNLP, 2013.
Full Text Poster Demo Abstract BibTeX Details
Coreference resolution is a well known clustering task in Natural Language Processing. In this paper, we describe the Latent Left Linking model (L3M), a novel, principled, and linguistically motivated latent structured prediction approach to coreference resolution.
We show that L3M admits efficient inference and can be augmented with knowledge-based constraints; we also present a fast stochastic gradient based learning.
Experiments on ACE and Ontonotes data show that L3M and its constrained version, CL3M, are more accurate than several state-of-the-art approaches as well as some structured prediction models proposed in the literature.
@inproceedings{ChangSaRo13,
author = {Chang, Kai-Wei and Samdani, Rajhans and Roth, Dan},
title = {A Constrained Latent Variable Model for Coreference Resolution},
booktitle = {EMNLP},
year = {2013}
}

Details
• #### Multi-Relational Latent Semantic Analysis

Kai-Wei Chang, Wen-tau Yih, and Chris Meek, in EMNLP, 2013.
Full Text Slides Abstract BibTeX Details
We present Multi-Relational Latent Semantic Analysis (MRLSA) which generalizes Latent Semantic Analysis (LSA). MRLSA provides an elegant approach to combining multiple relations between words by constructing a 3-way tensor. Similar to LSA, a low-rank approximation of the tensor is derived using a tensor decomposition. Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. The degree of two words having a specific relation can then be measured through simple linear algebraic operations. We demonstrate that by integrating multiple relations from both homogeneous and heterogeneous information sources, MRLSA achieves state-of-the-art performance on existing benchmark datasets for two relations, antonymy and is-a.
@inproceedings{chang2013mrlsa,
author = {Chang, Kai-Wei and Yih, Wen-tau and Meek, Chris},
title = {Multi-Relational Latent Semantic Analysis},
booktitle = {EMNLP},
year = {2013}
}

Details
• #### Tractable Semi-Supervised Learning of Complex Structured Prediction Models

Kai-wei Chang, S. Sundararajan, and S. Sathiya Keerthi, in ECML, 2013.
Full Text Slides Poster Abstract BibTeX Details
Semi-supervised learning has been widely studied in the literature. However, most previous works assume that the output structure is simple enough to allow the direct use of tractable inference/learning algorithms (e.g., binary label or linear chain). Therefore, these methods cannot be applied to problems with complex structure. In this paper, we propose an approximate semi-supervised learning method that uses piecewise training for estimating the model weights and a dual decomposition approach for solving the inference problem of finding the labels of unlabeled data subject to domain specific constraints. This allows us to extend semi-supervised learning to general structured prediction problems. As an example, we apply this approach to the problem of multi-label classification (a fully connected pairwise Markov random field). Experimental results on benchmark data show that, in spite of using approximations, the approach is effective and yields good improvements in generalization performance over the plain supervised method. In addition, we demonstrate that our inference engine can be applied to other semi-supervised learning frameworks, and extends them to solve problems with complex structure.
@inproceedings{ChangSuKe13,
author = {Chang, Kai-wei and Sundararajan, S. and Keerthi, S. Sathiya},
title = {Tractable Semi-Supervised Learning of Complex Structured Prediction Models},
booktitle = {ECML},
year = {2013}
}

Details
• #### The University of Illinois System in the CoNLL-2013 Shared Task

Alla Rozovskaya, Kai-Wei Chang, Mark Sammons, and Dan Roth, in CoNLL Shared Task, 2013.
Full Text Poster Abstract BibTeX Details
The CoNLL-2013 shared task focuses on correcting grammatical errors in essays written by non-native learners of English. In this paper, we describe the University of Illinois system that participated in the shared task. The system consists of five components and targets five types of common grammatical mistakes made by English as Second Language writers. We describe our underlying approach, which relates to our previous work, and describe the novel aspects of the system in more detail. Out of 17 participating teams, our system is ranked first based on both the original annotation and on the revised annotation.
@inproceedings{RCSR13,
author = {Rozovskaya, Alla and Chang, Kai-Wei and Sammons, Mark and Roth, Dan},
title = {The University of Illinois System in the CoNLL-2013 Shared Task},
booktitle = {CoNLL Shared Task},
year = {2013}
}

Details

## 2012

• #### Illinois-Coref: The UI System in the CoNLL-2012 Shared Task

Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth, in CoNLL Shared Task, 2012.
Full Text Poster Abstract BibTeX Details
The CoNLL-2012 shared task is an extension of the last year��s coreference task. We participated in the closed track of the shared tasks in both years. In this paper, we present the improvements of Illinois-Coref system from last year. We focus on improving mention detection and pronoun coreference resolution, and present a new learning protocol. These new strategies boost the performance of the system by 5% MUC F1, 0.8% BCUB F1, and 1.7% CEAF F1 on the OntoNotes-5.0 development set.
@inproceedings{CSRSR12,
author = {Chang, Kai-Wei and Samdani, Rajhans and Rozovskaya, Alla and Sammons, Mark and Roth, Dan},
title = {Illinois-Coref: The UI System in the CoNLL-2012 Shared Task},
booktitle = {CoNLL Shared Task},
year = {2012}
}

Details
• #### Efficient Pattern-Based Time Series Classification on GPU

Kai-Wei Chang, Baplab Deka, W.-M. W. Hwu, and Dan Roth, in ICDM, 2012.
Full Text Abstract BibTeX Details
Time series shapelet discovery algorithm finds subsequences from a set of time series for use as primitives for time series classification. This algorithm has drawn a lot of interest because of the interpretability of its results. However, computation requirements restrict the algorithm from dealing with large data sets and may limit its application in many domains. In this paper, we address this issue by redesigning the algorithm for implementation on highly parallel Graphics Process Units (GPUs). We investigate several concepts of GPU programming and propose a dynamic programming algorithm that is suitable for implementation on GPUs. Results show that the proposed GPU implementation significantly reduces the running time of the shapelet discovery algorithm. For example, on the largest sample dataset from the original authors, the running time is reduced from half a day to two minutes.
@inproceedings{CDHR12,
author = {Chang, Kai-Wei and Deka, Baplab and Hwu, W.-M. W. and Roth, Dan},
title = {Efficient Pattern-Based Time Series Classification on GPU },
booktitle = {ICDM},
year = {2012}
}

Details
• #### Large Linear Classification When Data Cannot Fit In Memory

Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang, and Chih-Jen Lin, in TKDD, 2012.
Full Text Code Abstract BibTeX Details Best Paper Award, KDD 10
Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. As data cannot fit in memory, many design considerations are very different from those for traditional algorithms. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.
@inproceedings{yu2010large,
author = {Yu, Hsiang-Fu and Hsieh, Cho-Jui and Chang, Kai-Wei and Lin, Chih-Jen},
title = {Large Linear Classification When Data Cannot Fit In Memory},
booktitle = {TKDD},
year = {2012}
}

Details

## 2011

• #### Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models

Kai-Wei Chang and Dan Roth, in KDD, 2011.
Full Text Slides Poster Code Abstract BibTeX Details
As the size of data sets used to build classifiers steadily increases, training a linear model efficiently with limited memory becomes essential. Several techniques deal with this problem by loading blocks of data from disk one at a time, but usually take a considerable number of iterations to converge to a reasonable model. Even the best block minimization techniques [1] require many block loads since they treat all training examples uniformly. As disk I/O is expensive, reducing the amount of disk access can dramatically decrease the training time.
@inproceedings{ChangRo11,
author = {Chang, Kai-Wei and Roth, Dan},
title = {Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models},
booktitle = {KDD},
year = {2011}
}

Details
• #### Inference Protocols for Coreference Resolution

Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth, in CoNLL Shared Task, 2011.
Full Text Slides Poster Abstract BibTeX Details
This paper presents Illinois-Coref, a system for coreference resolution that participated in the CoNLL-2011 shared task. We investigate two inference methods, Best-Link and All-Link, along with their corresponding, pairwise and structured, learning protocols. Within these, we provide a flexible architecture for incorporating linguistically-motivated constraints, several of which we developed and integrated. We compare and evaluate the inference approaches and the contribution of constraints, analyze the mistakes of the system, and discuss the challenges of resolving coreference for the OntoNotes-4.0 data set.
@inproceedings{CSRRSR11,
author = {Chang, Kai-Wei and Samdani, Rajhans and Rozovskaya, Alla and Rizzolo, Nick and Sammons, Mark and Roth, Dan},
title = {Inference Protocols for Coreference Resolution},
booktitle = {CoNLL Shared Task},
year = {2011}
}

Details

## 2010

• #### Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models

Fang-Lan Huang, Cho-Jui Hsieh, Kai-Wei Chang, and Chih-Jen Lin, in JMLR, 2010.
Full Text Abstract BibTeX Details
Maximum entropy (Maxent) is useful in natural language processing and many other areas. Iterative scaling (IS) methods are one of the most popular approaches to solve Maxent. With many variants of IS methods, it is difficult to understand them and see the differences. In this paper, we create a general and unified framework for iterative scaling methods. This framework also connects iterative scaling and coordinate descent methods. We prove general convergence results for IS methods and analyze their computational complexity. Based on the proposed framework, we extend a coordinate descent method for linear SVM to Maxent. Results show that it is faster than existing iterative scaling methods.
@inproceedings{HHCL10,
author = {Huang, Fang-Lan and Hsieh, Cho-Jui and Chang, Kai-Wei and Lin, Chih-Jen},
title = {Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models},
booktitle = {JMLR},
year = {2010}
}

Details
• #### Training and Testing Low-degree Polynomial Data Mappings via Linear SVM

Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin, in JMLR, 2010.
Full Text Code Abstract BibTeX Details
Kernel techniques have long been used in SVM to handle linearly inseparable problems by transforming data to a high dimensional space, but training and testing large data sets is often time consuming. In contrast, we can efficiently train and test much larger data sets using linear SVM without kernels. In this work, we apply fast linear-SVM methods to the explicit form of polynomially mapped data and investigate implementation issues. The approach enjoys fast training and testing, but may sometimes achieve accuracy close to that of using highly nonlinear kernels. Empirical experiments show that the proposed method is useful for certain large-scale data sets. We successfully apply the proposed method to a natural language processing (NLP) application by improving the testing accuracy under some training/testing speed requirements.
@inproceedings{CHCRL10,
author = {Chang, Yin-Wen and Hsieh, Cho-Jui and Chang, Kai-Wei and Ringgaard, Michael and Lin, Chih-Jen},
title = {Training and Testing Low-degree Polynomial Data Mappings via Linear SVM},
booktitle = {JMLR},
year = {2010}
}

Details
• #### A Comparison of Optimization Methods and software for Large-scale L1-regularized Linear Classification

Guo-Xun Yuan, Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin, in JMLR, 2010.
Full Text Code Abstract BibTeX Details
Large-scale linear classification is widely used in many areas. The L1-regularized form can be applied for feature selection; however, its non-differentiability causes more difficulties in training. Although various optimization methods have been proposed in recent years, these have not yet been compared suitably. In this paper, we first broadly review existing methods. Then, we discuss state-of-the-art software packages in detail and propose two efficient implementations. Extensive comparisons indicate that carefully implemented coordinate descent methods are very suitable for training large document data.
@inproceedings{YCHL10,
author = {Yuan, Guo-Xun and Chang, Kai-Wei and Hsieh, Cho-Jui and Lin, Chih-Jen},
title = {A Comparison of Optimization Methods and software for Large-scale L1-regularized Linear Classification},
booktitle = {JMLR},
year = {2010}
}

Details

## 2009

• #### An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naıve Bayes

Hung-Yi Lo, Kai-Wei Chang, Shang-Tse Chen, Tsung-Hsien Chiang, ChunSung Ferng, Cho-Jui Hsieh, Yi-Kuang Ko, Tsung-Ting Kuo, Hung-Che Lai, Ken-Yi Lin, Chia-Hsuan Wang, Hsiang-Fu Yu, Chih-Jen Lin, Hsuan-Tien Lin, and Shou-de Lin, in KDD Cup, 2009.
Full Text Abstract BibTeX Details
This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective na¨ıve Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using crossvalidation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009.
@inproceedings{LCCCFHKKLLWYLLL09,
author = {Lo, Hung-Yi and Chang, Kai-Wei and Chen, Shang-Tse and Chiang, Tsung-Hsien and Ferng, ChunSung and Hsieh, Cho-Jui and Ko, Yi-Kuang and Kuo, Tsung-Ting and Lai, Hung-Che and Lin, Ken-Yi and Wang, Chia-Hsuan and Yu, Hsiang-Fu and Lin, Chih-Jen and Lin, Hsuan-Tien and Lin, Shou-de},
title = {An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naıve Bayes},
booktitle = {KDD Cup},
year = {2009}
}

Details

## 2008

• #### A Sequential Dual Method for Large Scale Multi-Class Linear SVMs

S. Sathiya Keerthi, S. Sundararajan, Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin, in KDD, 2008.
Full Text Code Abstract BibTeX Details
Efficient training of direct multi-class formulations of linear Support Vector Machines is very useful in applications such as text classification with a huge number examples as well as features. This paper presents a fast dual method for this training. The main idea is to sequentially traverse through the training set and optimize the dual variables associated with one example at a time. The speed of training is enhanced further by shrinking and cooling heuristics. Experiments indicate that our method is much faster than state of the art solvers such as bundle, cutting plane and exponentiated gradient methods
@inproceedings{KSCHL08,
author = {Keerthi, S. Sathiya and Sundararajan, S. and Chang, Kai-Wei and Hsieh, Cho-Jui and Lin, Chih-Jen},
title = {A Sequential Dual Method for Large Scale Multi-Class Linear SVMs},
booktitle = {KDD},
year = {2008}
}

Details
• #### A Dual Coordinate Descent Method for Large-Scale Linear SVM

Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, Sathia S. Keerthi, and S. Sundararajan, in ICML, 2008.
Full Text Slides Code Abstract BibTeX Details
In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such large-scale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1- and L2- loss functions. The proposed method is simple and reaches an ϵ-accurate solution in O(log(1/ϵ)) iterations. Experiments indicate that our method is much faster than state of the art solvers such as Pegasos, TRON, SVMperf , and a recent primal coordinate descent implementation.
@inproceedings{HCLKS08,
author = {Hsieh, Cho-Jui and Chang, Kai-Wei and Lin, Chih-Jen and Keerthi, Sathia S. and Sundararajan, S.},
title = {A Dual Coordinate Descent Method for Large-Scale Linear SVM},
booktitle = {ICML},
year = {2008}
}

Details
• #### Coordinate Descent Method for Large-scale L2-loss Linear SVM

Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin, in JMLR, 2008.
Full Text Code Abstract BibTeX Details
Linear support vector machines (SVM) are useful for classifying large-scale sparse data. Problems with sparse features are common in applications such as document classification and natural language processing. In this paper, we propose a novel coordinate descent algorithm for training linear SVM with the L2-loss function. At each step, the proposed method minimizes a one-variable sub-problem while fixing other variables. The sub-problem is solved by Newton steps with the line search technique. The procedure globally converges at the linear rate. As each sub-problem involves only values of a corresponding feature, the proposed approach is suitable when accessing a feature is more convenient than accessing an instance. Experiments show that our method is more efficient and stable than state of the art methods such as Pegasos and TRON.
@inproceedings{ChangHsLi08,
author = {Chang, Kai-Wei and Hsieh, Cho-Jui and Lin, Chih-Jen},
title = {Coordinate Descent Method for Large-scale L2-loss Linear SVM},
booktitle = {JMLR},
year = {2008}
}

Details
• #### LIBLINEAR: A Library for Large Linear Classification

Rong En Fan, Kai-Wei Chang, Cho-Jui Hsieh, X.-R. Wang, and Chih-Jen Lin, in JMLR, 2008.
Full Text Code Abstract BibTeX Details
LIBLINEAR is an open source library for large-scale linear classification. It supports logistic regression and linear support vector machines. We provide easy-to-use command-line tools and library calls for users and developers. Comprehensive documents are available for both beginners and advanced users. Experiments demonstrate that LIBLINEAR is very efficient on large sparse data sets.
@inproceedings{FCHWL08,
author = {Fan, Rong En and Chang, Kai-Wei and Hsieh, Cho-Jui and Wang, X.-R. and Lin, Chih-Jen},
title = {LIBLINEAR: A Library for Large Linear Classification},
booktitle = {JMLR},
year = {2008}
}

Details