At UCLA-NLP, our mission is to develop fair, accountable, robust natural language processing technology to benefit everyone. We will present papers at EMNLP 2020 on the following topics.

### Fairness in Natural Language Processing

Natural Language Processing (NLP) models are widely used in our daily lives. Despite these methods achieve high performance in various applications, they run the risk of exploiting and reinforcing the societal biases (e.g. gender bias) that are present in the underlying data. At EMNLP, we present our studies on 1) how to detect bias in a local region of instances, 2) how to control bias in language generation.

• ### LOGAN: Local Group Bias Detection by Clustering

Jieyu Zhao and Kai-Wei Chang, in EMNLP (short), 2020.
QA Sessions: Gather-1I: Nov 17, 02:00-04:00 UTC / 18:00-20:00 PST -1d Paper link in the virtual conference
Full Text Code BibTeX Details
Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.
@inproceedings{zhao2020logan,
author = {Zhao, Jieyu and Chang, Kai-Wei},
title = {LOGAN: Local Group Bias Detection by Clustering},
booktitle = {EMNLP (short)},
presentation_id = {https://virtual.2020.emnlp.org/paper_main.2886.html},
year = {2020}
}


• #### Societal Biases in Language Generation: Progress and Challenges

Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng, in ACL, 2021.
Full Text Abstract BibTeX Details
Technology for language generation has advanced rapidly, spurred by advancements in pre-training large models on massive amounts of data and the need for intelligent agents to communicate in a natural manner. While techniques can effectively generate fluent text, they can also produce undesirable societal biases that can have a disproportionately negative impact on marginalized populations. Language generation presents unique challenges for biases in terms of direct user interaction and the structure of decoding techniques. To better understand these challenges, we present a survey on societal biases in language generation, focusing on how data and techniques contribute to biases and progress towards reducing biases. Motivated by a lack of studies on biases from decoding techniques, we also conduct experiments to quantify the effects of these techniques. By further discussing general trends and open challenges, we call to attention promising directions for research and the importance of fairness and inclusivity considerations for language generation applications.
@inproceedings{sheng2021defense,
title = {Societal Biases in Language Generation: Progress and Challenges},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Prem and Peng, Nanyun},
booktitle = {ACL},
year = {2021}
}

Details
• #### Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification

Yada Pruksachatkun, Satyapriya Krishna, Jwala Dhamala, Rahul Gupta, and Kai-Wei Chang, in ACL-Finding, 2021.
Full Text Abstract BibTeX Details
Existing bias mitigation methods to reduce disparities in model outcomes across cohorts have focused on data augmentation, debiasing model embeddings, or adding fairness-based optimization objectives during training. Separately, certified word substitution robustness methods have been developed to decrease the impact of spurious features and synonym substitutions on model predictions. While their end goals are different, they both aim to encourage models to make the same prediction for certain changes in the input. In this paper, we investigate the utility of certified word substitution robustness methods to improve equality of odds and equality of opportunity on multiple text classification tasks. We observe that certified robustness methods improve fairness, and using both robustness and bias mitigation methods in training results in an improvement in both fronts.
@inproceedings{pruksachatkun2021robustness,
title = {Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification},
author = {Pruksachatkun, Yada and Krishna, Satyapriya and Dhamala, Jwala and Gupta, Rahul and Chang, Kai-Wei},
booktitle = {ACL-Finding},
year = {2021}
}

Details
• #### "Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses

Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Ad hominem attacks are those that target some feature of a person’s character instead of the position the person is maintaining. These attacks are harmful because they propagate implicit biases and diminish a person’s credibility. Since dialogue systems respond directly to user input, it is important to study ad hominems in dialogue responses. To this end, we propose categories of ad hominems, compose an annotated dataset, and build a classifier to analyze human and dialogue system responses to English Twitter posts. We specifically compare responses to Twitter topics about marginalized communities (#BlackLivesMatter, #MeToo) versus other topics (#Vegan, #WFH), because the abusive language of ad hominems could further amplify the skew of power away from marginalized populations. Furthermore, we propose a constrained decoding technique that uses salient n-gram similarity as a soft constraint for top-k sampling to reduce the amount of ad hominems generated. Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can use constrained decoding techniques to reduce ad hominems in generated dialogue responses.
@inproceedings{sheng2021nice,
title = {"Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses},
booktitle = {NAACL},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Prem and Peng, Nanyun},
year = {2021}
}

Details
• #### BOLD: Dataset and metrics for measuring biases in open-ended language generation

Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta, in FAccT, 2021.
Full Text Code Abstract BibTeX Details
Recent advances in deep learning techniques have enabled machines to generate cohesive open-ended text when prompted with a sequence of words as context. While these models now empower many downstream applications from conversation bots to automatic storytelling, they have been shown to generate texts that exhibit social biases. To systematically study and benchmark social biases in open-ended language generation, we introduce the Bias in Open-Ended Language Generation Dataset (BOLD), a large-scale dataset that consists of 23,679 English text generation prompts for bias benchmarking across five domains: profession, gender, race, religion, and political ideology. We also propose new automated metrics for toxicity, psycholinguistic norms, and text gender polarity to measure social biases in open-ended text generation from multiple angles. An examination of text generated from three popular language models reveals that the majority of these models exhibit a larger social bias than human-written Wikipedia text across all domains. With these results we highlight the need to benchmark biases in open-ended language generation and caution users of language generation models on downstream tasks to be cognizant of these embedded prejudices.
@inproceedings{dhamala2021bold,
author = {Dhamala, Jwala and Sun, Tony and Kumar, Varun and Krishna, Satyapriya and Pruksachatkun, Yada and Chang, Kai-Wei and Gupta, Rahul},
title = {BOLD: Dataset and metrics for measuring biases in open-ended language generation},
booktitle = {FAccT},
year = {2021}
}

Details
• #### LOGAN: Local Group Bias Detection by Clustering

Jieyu Zhao and Kai-Wei Chang, in EMNLP (short), 2020.
Full Text Code Abstract BibTeX Details
Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.
@inproceedings{zhao2020logan,
author = {Zhao, Jieyu and Chang, Kai-Wei},
title = {LOGAN: Local Group Bias Detection by Clustering},
booktitle = {EMNLP (short)},
presentation_id = {https://virtual.2020.emnlp.org/paper_main.2886.html},
year = {2020}
}

Details
• #### Towards Controllable Biases in Language Generation

Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP-Finding, 2020.
Full Text Code Abstract BibTeX Details
We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We then analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics. The former scenario enables us to detect the types of biases present in the model. Specifically, we show the effectiveness of our approach at facilitating bias analysis by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics. The second scenario is useful for mitigating biases in downstream applications such as dialogue generation. In our experiments, the mitigation technique proves to be effective at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.
@inproceedings{sheng2020towards,
title = {Towards Controllable Biases in Language Generation},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
booktitle = {EMNLP-Finding},
year = {2020}
}

Details
• #### Towards Understanding Gender Bias in Relation Extraction

Andrew Gaut, Tony Sun, Shirlyn Tang, Yuxin Huang, Jing Qian, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang, in ACL, 2020.
Full Text Abstract BibTeX Details
Recent developments in Neural Relation Extraction (NRE) have made significant strides towards automated knowledge base construction. While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to evaluate social biases exhibited in NRE systems. In this paper, we create WikiGenderBias, a distantly supervised dataset composed of over 45,000 sentences including a 10% human annotated test set for the purpose of analyzing gender bias in relation extraction systems. We find that when extracting spouse and hypernym (i.e., occupation) relations, an NRE system performs differently when the gender of the target entity is different. However, such disparity does not appear when extracting relations such as birth date or birth place. We also analyze two existing bias mitigation techniques, word embedding debiasing and data augmentation. Unfortunately, due to NRE models relying heavily on surface level cues, we find that existing bias mitigation approaches have a negative effect on NRE. Our analysis lays groundwork for future quantifying and mitigating bias in relation extraction.
@inproceedings{gaut2020towards,
author = {Gaut, Andrew and Sun, Tony and Tang, Shirlyn and Huang, Yuxin and Qian, Jing and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Belding, Elizabeth and Chang, Kai-Wei and Wang, William Yang},
title = {Towards Understanding Gender Bias in Relation Extraction},
booktitle = {ACL},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.265.html}
}

Details
• #### Mitigating Gender Bias Amplification in Distribution by Posterior Regularization

Shengyu Jia, Tao Meng, Jieyu Zhao, and Kai-Wei Chang, in ACL (short), 2020.
Full Text Slides Video Code Abstract BibTeX Details
Advanced machine  learning  techniques  have boosted  the  performance  of  natural  language processing.  Nevertheless, recent studies, e.g., Zhao et al. (2017) show that these techniques inadvertently capture the societal bias hiddenin the corpus and further amplify it.  However,their analysis is conducted only on models’ top predictions.   In this paper,  we investigate thegender  bias  amplification  issue  from  the  distribution perspective and demonstrate that thebias is amplified in the view of predicted probability distribution over labels. We further propose a bias mitigation approach based on posterior regularization.   With little performance loss,  our method can almost remove the bias amplification  in  the  distribution. Our study sheds the light on understanding the bias amplification.
@inproceedings{jia2020mitigating,
author = {Jia, Shengyu and Meng, Tao and Zhao, Jieyu and Chang, Kai-Wei},
title = {Mitigating Gender Bias Amplification in Distribution by Posterior Regularization},
booktitle = {ACL (short)},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.264.html}
}

Details
• #### The Woman Worked as a Babysitter: On Biases in Language Generation

Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP (short), 2019.
Full Text Slides Video Code Abstract BibTeX Details
We present a systematic study of biases in natural language generation (NLG) by analyzing text generated from prompts that contain mentions of different demographic groups. In this work, we introduce the notion of the regard towards a demographic, use the varying levels of regard towards different demographics as a defining metric for bias in NLG, and analyze the extent to which sentiment scores are a relevant proxy metric for regard. To this end, we collect strategically-generated text from language models and manually annotate the text with both sentiment and regard scores. Additionally, we build an automatic regard classifier through transfer learning, so that we can analyze biases in unseen text. Together, these methods reveal the extent of the biased nature of language model generations. Our analysis provides a study of biases in NLG, bias metrics and correlated human judgments, and empirical evidence on the usefulness of our annotated dataset.
@inproceedings{sheng2019woman,
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
title = {The Woman Worked as a Babysitter: On Biases in Language Generation},
booktitle = {EMNLP (short)},
vimeo_id = {426366363},
year = {2019}
}

Details
• #### Mitigating Gender in Natural Language Processing: Literature Review

Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Kai-Wei Chang, and William Yang Wang, in ACL, 2019.
Full Text Slides Video Abstract BibTeX Details
As Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity, it becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. Although NLP models have shown success in modeling various applications, they propagate and may even amplify gender bias found in text corpora. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent. In this paper, we review contemporary studies on recognizing and mitigating gender bias in NLP. We discuss gender bias based on four forms of representation bias and analyze methods recognizing gender bias. Furthermore, we discuss the advantages and drawbacks of existing gender debiasing methods. Finally, we discuss future studies for recognizing and mitigating gender bias in NLP.
@inproceedings{sun2019mitigating,
author = {Sun, Tony and Gaut, Andrew and Tang, Shirlyn and Huang, Yuxin and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Chang, Kai-Wei and Wang, William Yang},
title = {Mitigating Gender in Natural Language Processing: Literature Review},
booktitle = {ACL},
vimeo_id = {384482151},
year = {2019}
}

Details
• #### Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang, in NAACL (short), 2018.
Full Text Poster Code Abstract BibTeX Details
In this paper, we introduce a new benchmark for co-reference resolution focused on gender bias, WinoBias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing datasets.
@inproceedings{zhao2018gender,
author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei},
title = {Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods},
booktitle = {NAACL (short)},
press_url = {https://www.stitcher.com/podcast/matt-gardner/nlp-highlights/e/55861936},
year = {2018}
}

Details
• #### Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang, in EMNLP, 2017.
Full Text Slides Code Abstract BibTeX Details EMNLP 2017 Best Long Paper Award
Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occuring labels and visual input but risk inadvertently encoding social biases found in web corpora.
In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, but a trained model amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for the resulting inference problems. Our method results in no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 33.3% and 44.9% for multilabel classification and visual semantic role labeling, respectively.
@inproceedings{zhao2017men,
author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei},
title = {Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints},
booktitle = {EMNLP},
year = {2017}
}

Details

Details
• ### Towards Controllable Biases in Language Generation

Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP-Finding, 2020.
Full Text Code BibTeX Details
We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We then analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics. The former scenario enables us to detect the types of biases present in the model. Specifically, we show the effectiveness of our approach at facilitating bias analysis by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics. The second scenario is useful for mitigating biases in downstream applications such as dialogue generation. In our experiments, the mitigation technique proves to be effective at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.
@inproceedings{sheng2020towards,
title = {Towards Controllable Biases in Language Generation},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
booktitle = {EMNLP-Finding},
year = {2020}
}


• #### Societal Biases in Language Generation: Progress and Challenges

Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng, in ACL, 2021.
Full Text Abstract BibTeX Details
Technology for language generation has advanced rapidly, spurred by advancements in pre-training large models on massive amounts of data and the need for intelligent agents to communicate in a natural manner. While techniques can effectively generate fluent text, they can also produce undesirable societal biases that can have a disproportionately negative impact on marginalized populations. Language generation presents unique challenges for biases in terms of direct user interaction and the structure of decoding techniques. To better understand these challenges, we present a survey on societal biases in language generation, focusing on how data and techniques contribute to biases and progress towards reducing biases. Motivated by a lack of studies on biases from decoding techniques, we also conduct experiments to quantify the effects of these techniques. By further discussing general trends and open challenges, we call to attention promising directions for research and the importance of fairness and inclusivity considerations for language generation applications.
@inproceedings{sheng2021defense,
title = {Societal Biases in Language Generation: Progress and Challenges},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Prem and Peng, Nanyun},
booktitle = {ACL},
year = {2021}
}

Details
• #### Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification

Yada Pruksachatkun, Satyapriya Krishna, Jwala Dhamala, Rahul Gupta, and Kai-Wei Chang, in ACL-Finding, 2021.
Full Text Abstract BibTeX Details
Existing bias mitigation methods to reduce disparities in model outcomes across cohorts have focused on data augmentation, debiasing model embeddings, or adding fairness-based optimization objectives during training. Separately, certified word substitution robustness methods have been developed to decrease the impact of spurious features and synonym substitutions on model predictions. While their end goals are different, they both aim to encourage models to make the same prediction for certain changes in the input. In this paper, we investigate the utility of certified word substitution robustness methods to improve equality of odds and equality of opportunity on multiple text classification tasks. We observe that certified robustness methods improve fairness, and using both robustness and bias mitigation methods in training results in an improvement in both fronts.
@inproceedings{pruksachatkun2021robustness,
title = {Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification},
author = {Pruksachatkun, Yada and Krishna, Satyapriya and Dhamala, Jwala and Gupta, Rahul and Chang, Kai-Wei},
booktitle = {ACL-Finding},
year = {2021}
}

Details
• #### "Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses

Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Ad hominem attacks are those that target some feature of a person’s character instead of the position the person is maintaining. These attacks are harmful because they propagate implicit biases and diminish a person’s credibility. Since dialogue systems respond directly to user input, it is important to study ad hominems in dialogue responses. To this end, we propose categories of ad hominems, compose an annotated dataset, and build a classifier to analyze human and dialogue system responses to English Twitter posts. We specifically compare responses to Twitter topics about marginalized communities (#BlackLivesMatter, #MeToo) versus other topics (#Vegan, #WFH), because the abusive language of ad hominems could further amplify the skew of power away from marginalized populations. Furthermore, we propose a constrained decoding technique that uses salient n-gram similarity as a soft constraint for top-k sampling to reduce the amount of ad hominems generated. Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can use constrained decoding techniques to reduce ad hominems in generated dialogue responses.
@inproceedings{sheng2021nice,
title = {"Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses},
booktitle = {NAACL},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Prem and Peng, Nanyun},
year = {2021}
}

Details
• #### BOLD: Dataset and metrics for measuring biases in open-ended language generation

Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta, in FAccT, 2021.
Full Text Code Abstract BibTeX Details
Recent advances in deep learning techniques have enabled machines to generate cohesive open-ended text when prompted with a sequence of words as context. While these models now empower many downstream applications from conversation bots to automatic storytelling, they have been shown to generate texts that exhibit social biases. To systematically study and benchmark social biases in open-ended language generation, we introduce the Bias in Open-Ended Language Generation Dataset (BOLD), a large-scale dataset that consists of 23,679 English text generation prompts for bias benchmarking across five domains: profession, gender, race, religion, and political ideology. We also propose new automated metrics for toxicity, psycholinguistic norms, and text gender polarity to measure social biases in open-ended text generation from multiple angles. An examination of text generated from three popular language models reveals that the majority of these models exhibit a larger social bias than human-written Wikipedia text across all domains. With these results we highlight the need to benchmark biases in open-ended language generation and caution users of language generation models on downstream tasks to be cognizant of these embedded prejudices.
@inproceedings{dhamala2021bold,
author = {Dhamala, Jwala and Sun, Tony and Kumar, Varun and Krishna, Satyapriya and Pruksachatkun, Yada and Chang, Kai-Wei and Gupta, Rahul},
title = {BOLD: Dataset and metrics for measuring biases in open-ended language generation},
booktitle = {FAccT},
year = {2021}
}

Details
• #### LOGAN: Local Group Bias Detection by Clustering

Jieyu Zhao and Kai-Wei Chang, in EMNLP (short), 2020.
Full Text Code Abstract BibTeX Details
Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.
@inproceedings{zhao2020logan,
author = {Zhao, Jieyu and Chang, Kai-Wei},
title = {LOGAN: Local Group Bias Detection by Clustering},
booktitle = {EMNLP (short)},
presentation_id = {https://virtual.2020.emnlp.org/paper_main.2886.html},
year = {2020}
}

Details
• #### Towards Controllable Biases in Language Generation

Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP-Finding, 2020.
Full Text Code Abstract BibTeX Details
We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We then analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics. The former scenario enables us to detect the types of biases present in the model. Specifically, we show the effectiveness of our approach at facilitating bias analysis by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics. The second scenario is useful for mitigating biases in downstream applications such as dialogue generation. In our experiments, the mitigation technique proves to be effective at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.
@inproceedings{sheng2020towards,
title = {Towards Controllable Biases in Language Generation},
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
booktitle = {EMNLP-Finding},
year = {2020}
}

Details
• #### Towards Understanding Gender Bias in Relation Extraction

Andrew Gaut, Tony Sun, Shirlyn Tang, Yuxin Huang, Jing Qian, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang, in ACL, 2020.
Full Text Abstract BibTeX Details
Recent developments in Neural Relation Extraction (NRE) have made significant strides towards automated knowledge base construction. While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to evaluate social biases exhibited in NRE systems. In this paper, we create WikiGenderBias, a distantly supervised dataset composed of over 45,000 sentences including a 10% human annotated test set for the purpose of analyzing gender bias in relation extraction systems. We find that when extracting spouse and hypernym (i.e., occupation) relations, an NRE system performs differently when the gender of the target entity is different. However, such disparity does not appear when extracting relations such as birth date or birth place. We also analyze two existing bias mitigation techniques, word embedding debiasing and data augmentation. Unfortunately, due to NRE models relying heavily on surface level cues, we find that existing bias mitigation approaches have a negative effect on NRE. Our analysis lays groundwork for future quantifying and mitigating bias in relation extraction.
@inproceedings{gaut2020towards,
author = {Gaut, Andrew and Sun, Tony and Tang, Shirlyn and Huang, Yuxin and Qian, Jing and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Belding, Elizabeth and Chang, Kai-Wei and Wang, William Yang},
title = {Towards Understanding Gender Bias in Relation Extraction},
booktitle = {ACL},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.265.html}
}

Details
• #### Mitigating Gender Bias Amplification in Distribution by Posterior Regularization

Shengyu Jia, Tao Meng, Jieyu Zhao, and Kai-Wei Chang, in ACL (short), 2020.
Full Text Slides Video Code Abstract BibTeX Details
Advanced machine  learning  techniques  have boosted  the  performance  of  natural  language processing.  Nevertheless, recent studies, e.g., Zhao et al. (2017) show that these techniques inadvertently capture the societal bias hiddenin the corpus and further amplify it.  However,their analysis is conducted only on models’ top predictions.   In this paper,  we investigate thegender  bias  amplification  issue  from  the  distribution perspective and demonstrate that thebias is amplified in the view of predicted probability distribution over labels. We further propose a bias mitigation approach based on posterior regularization.   With little performance loss,  our method can almost remove the bias amplification  in  the  distribution. Our study sheds the light on understanding the bias amplification.
@inproceedings{jia2020mitigating,
author = {Jia, Shengyu and Meng, Tao and Zhao, Jieyu and Chang, Kai-Wei},
title = {Mitigating Gender Bias Amplification in Distribution by Posterior Regularization},
booktitle = {ACL (short)},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.264.html}
}

Details
• #### The Woman Worked as a Babysitter: On Biases in Language Generation

Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP (short), 2019.
Full Text Slides Video Code Abstract BibTeX Details
We present a systematic study of biases in natural language generation (NLG) by analyzing text generated from prompts that contain mentions of different demographic groups. In this work, we introduce the notion of the regard towards a demographic, use the varying levels of regard towards different demographics as a defining metric for bias in NLG, and analyze the extent to which sentiment scores are a relevant proxy metric for regard. To this end, we collect strategically-generated text from language models and manually annotate the text with both sentiment and regard scores. Additionally, we build an automatic regard classifier through transfer learning, so that we can analyze biases in unseen text. Together, these methods reveal the extent of the biased nature of language model generations. Our analysis provides a study of biases in NLG, bias metrics and correlated human judgments, and empirical evidence on the usefulness of our annotated dataset.
@inproceedings{sheng2019woman,
author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
title = {The Woman Worked as a Babysitter: On Biases in Language Generation},
booktitle = {EMNLP (short)},
vimeo_id = {426366363},
year = {2019}
}

Details
• #### Mitigating Gender in Natural Language Processing: Literature Review

Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Kai-Wei Chang, and William Yang Wang, in ACL, 2019.
Full Text Slides Video Abstract BibTeX Details
As Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity, it becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. Although NLP models have shown success in modeling various applications, they propagate and may even amplify gender bias found in text corpora. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent. In this paper, we review contemporary studies on recognizing and mitigating gender bias in NLP. We discuss gender bias based on four forms of representation bias and analyze methods recognizing gender bias. Furthermore, we discuss the advantages and drawbacks of existing gender debiasing methods. Finally, we discuss future studies for recognizing and mitigating gender bias in NLP.
@inproceedings{sun2019mitigating,
author = {Sun, Tony and Gaut, Andrew and Tang, Shirlyn and Huang, Yuxin and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Chang, Kai-Wei and Wang, William Yang},
title = {Mitigating Gender in Natural Language Processing: Literature Review},
booktitle = {ACL},
vimeo_id = {384482151},
year = {2019}
}

Details
• #### Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang, in NAACL (short), 2018.
Full Text Poster Code Abstract BibTeX Details
In this paper, we introduce a new benchmark for co-reference resolution focused on gender bias, WinoBias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing datasets.
@inproceedings{zhao2018gender,
author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei},
title = {Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods},
booktitle = {NAACL (short)},
press_url = {https://www.stitcher.com/podcast/matt-gardner/nlp-highlights/e/55861936},
year = {2018}
}

Details
• #### Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang, in EMNLP, 2017.
Full Text Slides Code Abstract BibTeX Details EMNLP 2017 Best Long Paper Award
Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occuring labels and visual input but risk inadvertently encoding social biases found in web corpora.
In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, but a trained model amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for the resulting inference problems. Our method results in no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 33.3% and 44.9% for multilabel classification and visual semantic role labeling, respectively.
@inproceedings{zhao2017men,
author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei},
title = {Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints},
booktitle = {EMNLP},
year = {2017}
}

Details

Details

• ### Cross-Lingual Dependency Parsing by POS-Guided Word Reordering

Lu Liu, Yi Zhou, Jianhan Xu, Xiaoqing Zheng, Kai-Wei Chang, and Xuanjing Huang, in EMNLP-Finding, 2020.
Full Text BibTeX Details
We propose a novel approach to cross-lingual dependency parsing based on word reordering. The words in each sentence of a source language corpus are rearranged to meet the word order in a target language under the guidance of a part-of-speech based language model (LM). To obtain the highest reordering score under the LM, a population-based optimization algorithm and its genetic operators are designed to deal with the combinatorial nature of such word reordering. A parser trained on the reordered corpus then can be used to parse sentences in the target language. We demonstrate through extensive experimentation that our approach achieves better or comparable results across 25 target languages (1.73% increase in average), and outperforms a baseline by a significant margin on the languages that are greatly different from the source one. For example, when transferring the English parser to Hindi and Latin, our approach outperforms the baseline by 15.3% and 6.7% respectively.
@inproceedings{liu2020cross-lingual,
author = {Liu, Lu and Zhou, Yi and Xu, Jianhan and Zheng, Xiaoqing and Chang, Kai-Wei and Huang, Xuanjing},
title = {Cross-Lingual Dependency Parsing by POS-Guided Word Reordering},
booktitle = {EMNLP-Finding},
year = {2020}
}


• #### Syntax-augmented Multilingual BERT for Cross-lingual Transfer

Wasi Ahmad, Haoran Li, Kai-Wei Chang, and Yashar Mehdad, in ACL, 2021.
Full Text Abstract BibTeX Details
In recent years, we have seen a colossal effort
in pre-training multilingual text encoders using large-scale corpora in many languages to
facilitate cross-lingual transfer learning. However, due to typological differences across languages, the cross-lingual transfer is challenging. Nevertheless, language syntax, e.g., syntactic dependencies, can bridge the typological gap. Previous works have shown that pretrained multilingual encoders, such as mBERT
(Devlin et al., 2019), capture language syntax, helping cross-lingual transfer. This work
shows that explicitly providing language syntax and training mBERT using an auxiliary
objective to encode the universal dependency
tree structure helps cross-lingual transfer. We
perform rigorous experiments on four NLP
tasks, including text classification, question answering, named entity recognition, and taskoriented semantic parsing. The experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks, such as PAWS-X and MLQA, by 1.4
and 1.6 points on average across all languages.
In the generalized transfer setting, the performance boosted significantly, with 3.9 and 3.1
points on average in PAWS-X and MLQA.
@inproceedings{ahmad2021syntax,
title = {Syntax-augmented Multilingual BERT for Cross-lingual Transfer},
author = {Ahmad, Wasi and Li, Haoran and Chang, Kai-Wei and Mehdad, Yashar},
booktitle = {ACL},
year = {2021}
}

Details
• #### Evaluating the Values of Sources in Transfer Learning

Md Rizwan Parvez and Kai-Wei Chang, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Transfer learning that adapts a model trained on data-rich sources to low-resource targets has been widely applied in natural language processing (NLP). However, when training a transfer model over multiple sources, not every source is equally useful for the target. To better transfer a model, it is essential to understand the values of the sources. In this paper, we develop SEAL-Shap, an efficient source valuation framework for quantifying the usefulness of the sources (e.g., domains/languages) in transfer learning based on the Shapley value method. Experiments and comprehensive analyses on both cross-domain and cross-lingual transfers demonstrate that our framework is not only effective in choosing useful transfer sources but also the source values match the intuitive source-target similarity.
@inproceedings{parvez2021evaluating,
title = {Evaluating the Values of Sources in Transfer Learning},
author = {Parvez, Md Rizwan and Chang, Kai-Wei},
booktitle = {NAACL},
presentation_id = {https://underline.io/events/122/sessions/4261/lecture/19707-evaluating-the-values-of-sources-in-transfer-learning},
year = {2021}
}

Details
• #### GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction

Wasi Ahmad, Nanyun Peng, and Kai-Wei Chang, in AAAI, 2021.
Full Text Code Abstract BibTeX Details
Prevalent approaches in cross-lingual relation and event extraction use graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic representations such that models trained on one language can be applied to other languages. However, GCNs lack in modeling long-range dependencies or disconnected words in the dependency tree. To address this challenge, we propose to utilize the self-attention mechanism where we explicitly fuse structural information to learn the dependencies between words at different syntactic distances. We introduce GATE, a \bf Graph \bf Attention \bf Transformer \bf Encoder, and test its cross-lingual transferability on relation and event extraction tasks. We perform rigorous experiments on the widely used ACE05 dataset that includes three typologically different languages: English, Chinese, and Arabic. The evaluation results show that GATE outperforms three recently proposed methods by a large margin. Our detailed analysis reveals that due to the reliance on syntactic dependencies, GATE produces robust representations that facilitate transfer across languages.
@inproceedings{ahmad2021gate,
author = {Ahmad, Wasi and Peng, Nanyun and Chang, Kai-Wei},
title = {GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction},
booktitle = {AAAI},
year = {2021}
}

Details
• #### Cross-Lingual Dependency Parsing by POS-Guided Word Reordering

Lu Liu, Yi Zhou, Jianhan Xu, Xiaoqing Zheng, Kai-Wei Chang, and Xuanjing Huang, in EMNLP-Finding, 2020.
Full Text Abstract BibTeX Details
We propose a novel approach to cross-lingual dependency parsing based on word reordering. The words in each sentence of a source language corpus are rearranged to meet the word order in a target language under the guidance of a part-of-speech based language model (LM). To obtain the highest reordering score under the LM, a population-based optimization algorithm and its genetic operators are designed to deal with the combinatorial nature of such word reordering. A parser trained on the reordered corpus then can be used to parse sentences in the target language. We demonstrate through extensive experimentation that our approach achieves better or comparable results across 25 target languages (1.73% increase in average), and outperforms a baseline by a significant margin on the languages that are greatly different from the source one. For example, when transferring the English parser to Hindi and Latin, our approach outperforms the baseline by 15.3% and 6.7% respectively.
@inproceedings{liu2020cross-lingual,
author = {Liu, Lu and Zhou, Yi and Xu, Jianhan and Zheng, Xiaoqing and Chang, Kai-Wei and Huang, Xuanjing},
title = {Cross-Lingual Dependency Parsing by POS-Guided Word Reordering},
booktitle = {EMNLP-Finding},
year = {2020}
}

Details
• #### Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages

Wasi Ahmad, Zhisong Zhang, Xuezhe Ma, Kai-Wei Chang, and Nanyun Peng, in CoNLL, 2019.
Full Text Poster Code Abstract BibTeX Details
Cross-lingual transfer learning has become an important weapon to battle the unavailability of annotated resources for low-resource languages.  One of the fundamental techniques to transfer across languages is learning language-agnostic representations, in the form of word embeddings or contextual encodings. In this work, we propose to leverage unannotated sentences from auxiliary languages to help learning language-agnostic representations  Specifically, we explore adversarial training for learning contextual encoders that produce invariant representations across languages to facilitate cross-lingual transfer. We conduct experiments on cross-lingual dependency parsing where we train a dependency parser on a source language and transfer it to a wide range of target languages.  Experiments on 28 target languages demonstrate that adversarial training significantly improves the overall transfer performances under several different settings.  We conduct a careful analysis to evaluate the language-agnostic representations resulted from adversarial training.
@inproceedings{ahmad2019crosslingual,
author = {Ahmad, Wasi and Zhang, Zhisong and Ma, Xuezhe and Chang, Kai-Wei and Peng, Nanyun},
title = {  Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages},
booktitle = {CoNLL},
year = {2019}
}

Details
• #### Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing

Tao Meng, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2019.
Full Text Poster Code Abstract BibTeX Details
Prior work on cross-lingual dependency parsing often focuses on capturing the commonalities between source and target languages and overlooks the potential of leveraging linguistic properties of the languages to facilitate the transfer. In this paper, we show that weak supervisions of linguistic knowledge for the target languages can improve a cross-lingual graph-based dependency parser substantially. Specifically, we explore several types of corpus linguistic statistics and compile them into corpus-wise constraints to guide the inference process during the test time. We adapt two techniques, Lagrangian relaxation and posterior regularization, to conduct inference with corpus-statistics constraints. Experiments show that the Lagrangian relaxation and posterior regularization inference improve the performances on 15 and 17 out of 19 target languages, respectively. The improvements are especially significant for target languages that have different word order features from the source language.
@inproceedings{meng2019target,
author = {Meng, Tao and Peng, Nanyun and Chang, Kai-Wei},
title = {Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing},
booktitle = {EMNLP},
year = {2019}
}

Details
• #### On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, and Nanyun Peng, in NAACL, 2019.
Full Text Video Code Abstract BibTeX Details
Different languages might have different wordorders. In this paper, we investigate cross-lingual transfer and posit that an order-agnostic model will perform better when trans-ferring to distant foreign languages. To test ourhypothesis, we train dependency parsers on anEnglish corpus and evaluate their transfer per-formance on 30 other languages. Specifically,we compare encoders and decoders based onRecurrent Neural Networks (RNNs) and mod-ified self-attentive architectures. The formerrelies on sequential information while the lat-ter is more flexible at modeling word order.Rigorous experiments and detailed analysisshows that RNN-based architectures transferwell to languages that are close to English,while self-attentive models have better overallcross-lingual transferability and perform espe-cially well on distant languages.
@inproceedings{ahmad2019difficulties,
author = {Ahmad, Wasi Uddin and Zhang, Zhisong and Ma, Xuezhe and Hovy, Eduard and Chang, Kai-Wei and Peng, Nanyun},
title = {On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing},
booktitle = {NAACL},
year = {2019}
}

Details

Details

• ### PolicyQA: A Reading Comprehension Dataset for Privacy Policies

Wasi Ahmad, Jianfeng Chi, Yuan Tian, and Kai-Wei Chang, in EMNLP-Finding (short), 2020.
Full Text Code BibTeX Details
Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
@inproceedings{ahmad2020policyqa,
author = {Ahmad, Wasi and Chi, Jianfeng and Tian, Yuan and Chang, Kai-Wei},
title = {PolicyQA: A Reading Comprehension Dataset for Privacy Policies},
booktitle = {EMNLP-Finding (short)},
year = {2020}
}


• #### An Integer Linear Programming Framework for Mining Constraints from Data

Tao Meng and Kai-Wei Chang, in ICML, 2021.
Full Text Code Abstract BibTeX Details
Various structured output prediction problems (e.g., sequential tagging) involve constraints over the output space. By identifying these constraints, we can filter out infeasible solutions and build an accountable model.
To this end, we present a general integer linear programming (ILP) framework for mining constraints from data. We model the inference of structured output prediction as an ILP problem. Then, given the coefficients of the objective function and the corresponding solution, we mine the underlying constraints by estimating the outer and inner polytopes of the feasible set. We verify the proposed constraint mining algorithm in various synthetic and real-world applications and demonstrate that the proposed approach successfully identifies the feasible set at scale.
In particular, we show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules. We also demonstrate results on hierarchical multi-label classification and conduct a theoretical analysis on how close the mined constraints are from the ground truth.
@inproceedings{meng2020integer,
author = {Meng, Tao and Chang, Kai-Wei},
title = {An Integer Linear Programming Framework for Mining Constraints from Data},
booktitle = {ICML},
year = {2021}
}

Details
• #### Intent Classification and Slot Filling for Privacy Policies

Wasi Ahmad, Jianfeng Chi, Tu Le, Thomas Norton, Yuan Tian, and Kai-Wei Chang, in ACL, 2021.
Full Text Abstract BibTeX Details
Understanding privacy policies is crucial for users as it empowers them to learn about the information that matters to them. Sentences written in a privacy policy document explain privacy practices, and the constituent text spans convey further specific information about that practice. We refer to predicting the privacy practice explained in a sentence as intent classification and identifying the text spans sharing specific information as slot filling. In this work, we propose PolicyIE, a corpus consisting of 5,250 intent and 11,788 slot annotations spanning 31 privacy policies of websites and mobile applications. PolicyIE corpus is a challenging benchmark with limited labeled examples reflecting the cost of collecting large-scale annotations. We present two alternative neural approaches as baselines: (1) formulating intent classification and slot filling as a joint sequence tagging and (2) modeling them as a sequence-to-sequence (Seq2Seq) learning task. Experiment results show that both approaches perform comparably in intent classification, while the Seq2Seq method outperforms the sequence tagging approach in slot filling by a large margin. Error analysis reveals the deficiency of the baseline approaches, suggesting room for improvement in future works. We hope the PolicyIE corpus will stimulate future research in this domain.
@inproceedings{ahmad2021intent,
title = {Intent Classification and Slot Filling for Privacy Policies},
author = {Ahmad, Wasi and Chi, Jianfeng and Le, Tu and Norton, Thomas and Tian, Yuan and Chang, Kai-Wei},
booktitle = {ACL},
year = {2021}
}

Details
• #### Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs

Kuan-Hao Huang and Kai-Wei Chang, in EACL, 2021.
Full Text Code Abstract BibTeX Details
Paraphrase generation plays an essential role in natural language process (NLP), and it has many downstream applications. However, training supervised paraphrase models requires many annotated paraphrase pairs, which are usually costly to obtain. On the other hand, the paraphrases generated by existing unsupervised approaches are usually syntactically similar to the source sentences and are limited in diversity. In this paper, we demonstrate that it is possible to generate syntactically various paraphrases without the need for annotated paraphrase pairs. We propose Syntactically controlled Paraphrase Generator (SynPG), an encoder-decoder based model that learns to disentangle the semantics and the syntax of a sentence from a collection of unannotated texts. The disentanglement enables SynPG to control the syntax of output paraphrases by manipulating the embedding in the syntactic space. Extensive experiments using automatic metrics and human evaluation show that SynPG performs better syntactic control than unsupervised baselines, while the quality of the generated paraphrases is competitive. We also demonstrate that the performance of SynPG is competitive or even better than supervised models when the unannotated data is large. Finally, we show that the syntactically controlled paraphrases generated by SynPG can be utilized for data augmentation to improve the robustness of NLP models.
@inproceedings{huang2021generating,
author = {Huang, Kuan-Hao and Chang, Kai-Wei},
title = {Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs},
booktitle = {EACL},
year = {2021}
}

Details
• #### Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference

Yichao Zhou, Yu Yan, Rujun Han, J. Harry Caufield, Kai-Wei Chang, Yizhou Sun, Peipei Ping, and Wei Wang, in AAAI, 2021.
Full Text Code Abstract BibTeX Details
There  has  been  a  steady  need  in  the  medical  community to  precisely  extract  the  temporal  relations  between  clinical events. In particular, temporal information can facilitate a variety of downstream applications such as case report retrieval and medical question answering. However, existing methods either require expensive feature engineering or are incapable of  modeling  the  global  relational  dependencies  among  theevents. In this paper, we propose Clinical Temporal Relation Exaction  with  Probabilistic  Soft  Logic  Regularization  and Global Inference (CTRL-PG), a novel method to tackle the problem at the document level. Extensive experiments on two benchmark datasets, I2B2-2012 and TB-Dense, demonstrate that CTRL-PG significantly  outperforms  baseline  methodsfor temporal relation extraction.
@inproceedings{zhou2021clinical,
author = {Zhou, Yichao and Yan, Yu and Han, Rujun and Caufield, J. Harry and Chang, Kai-Wei and Sun, Yizhou and Ping, Peipei and Wang, Wei},
title = {Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference},
booktitle = {AAAI},
year = {2021}
}

Details
• #### PolicyQA: A Reading Comprehension Dataset for Privacy Policies

Wasi Ahmad, Jianfeng Chi, Yuan Tian, and Kai-Wei Chang, in EMNLP-Finding (short), 2020.
Full Text Code Abstract BibTeX Details
Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
@inproceedings{ahmad2020policyqa,
author = {Ahmad, Wasi and Chi, Jianfeng and Tian, Yuan and Chang, Kai-Wei},
title = {PolicyQA: A Reading Comprehension Dataset for Privacy Policies},
booktitle = {EMNLP-Finding (short)},
year = {2020}
}

Details
• #### GPT-GNN: Generative Pre-Training of Graph Neural Networks

Ziniu Hu, Yuxiao Dong, Kuansan Wang, Kai-Wei Chang, and Yizhou Sun, in KDD, 2020.
Full Text Video Code Abstract BibTeX Details
Graph neural networks (GNNs) have been demonstrated to besuccessful in modeling graph-structured data. However, training GNNs requires abundant task-specific labeled data, which is often arduously expensive to obtain. One effective way to reduce labeling effort is to pre-train an expressive GNN model on unlabelled data with self-supervision and then transfer the learned knowledge to downstream models. In this paper, we present the GPT-GNN’s framework to initialize GNNs by generative pre-training. GPT-GNN introduces a self-supervised attributed graph generation task to pre-train a GNN,which allows the GNN to capture the intrinsic structural and semantic properties of the graph. We factorize the likelihood of graph generation into two components: 1) attribute generation, and 2) edgegeneration. By modeling both components, GPT-GNN captures the inherent dependency between node attributes and graph structure during the generative process. Comprehensive experiments on thebillion-scale academic graph and Amazon recommendation data demonstrate that GPT-GNN significantly outperforms state-of-the-art base GNN models without pre-training by up to 9.1% across different downstream tasks.
@inproceedings{hu2020gptgnn,
author = {Hu, Ziniu and Dong, Yuxiao and Wang, Kuansan and Chang, Kai-Wei and Sun, Yizhou},
title = {GPT-GNN: Generative Pre-Training of Graph Neural Networks},
booktitle = {KDD},
slide_url = {https://acbull.github.io/pdf/gpt.pptx},
year = {2020}
}

Details
• #### SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics

Da Yin, Tao Meng, and Kai-Wei Chang, in ACL, 2020.
Full Text Slides Video Code Abstract BibTeX Details
We propose SentiBERT, a variant of BERT that effectively captures compositional sentiment semantics. The model incorporates contextualized representation with binary constituency parse tree to capture semantic composition. Comprehensive experiments demonstrate that SentiBERT achieves competitive performance on phrase-level sentiment classification. We further demonstrate that the sentiment composition learned from the phrase-level annotations on SST can be transferred to other sentiment analysis tasks as well as related tasks, such as emotion classification tasks. Moreover, we conduct ablation studies and design visualization methods to understand SentiBERT. We show that SentiBERT is better than baseline approaches in capturing negation and the contrastive relation and model the compositional sentiment semantics.
@inproceedings{yin2020sentibert,
author = {Yin, Da and Meng, Tao and Chang, Kai-Wei},
title = {SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics},
booktitle = {ACL},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.341.html}
}

Details
• #### Building Language Models for Text with Named Entities

Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in ACL, 2018.
Full Text Poster Code Abstract BibTeX Details
Text in many domains involves a significant amount of named entities. Predicting the entity names is often challenging for a language model as they appear less frequent on the training corpus. In this paper, we propose a novel and effective approach to building a language model which can learn the entity names by leveraging their entity type information. We also introduce two benchmark datasets based on recipes and Java programming codes, on which we evaluate the proposed model. Experimental results show that our model achieves 52.2% better perplexity in recipe generation and 40.3% on code generation than state-of-the-art language models.
@inproceedings{parvez2018building,
author = {Parvez, Md Rizwan and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
title = {Building Language Models for Text with Named Entities},
booktitle = {ACL},
year = {2018}
}

Details
• #### Learning from Explicit and Implicit Supervision Jointly For Algebra Word Problems

Shyam Upadhyay, Ming-Wei Chang, Kai-Wei Chang, and Wen-tau Yih, in EMNLP, 2016.
Full Text Abstract BibTeX Details
Automatically solving algebra word problems has raised considerable interest recently. Existing state-of-the-art approaches mainly rely on learning from human annotated equations. In this paper, we demonstrate that it is possible to efficiently mine algebra problems and their numerical solutions with little to no manual effort. To leverage the mined dataset, we propose a novel structured-output learning algorithm that aims to learn from both explicit (e.g., equations) and implicit (e.g., solutions) supervision signals jointly. Enabled by this new algorithm, our model gains 4.6% absolute improvement in accuracy on the ALG-514 benchmark compared to the one without using implicit supervision. The final model also outperforms the current state-of-the-art approach by 3%.
Dataset
@inproceedings{BCWS16,
author = {Upadhyay, Shyam and Chang, Ming-Wei and Chang, Kai-Wei and Yih, Wen-tau},
title = {Learning from Explicit and Implicit Supervision Jointly For Algebra Word Problems},
booktitle = {EMNLP},
year = {2016}
}

Details

Details