At UCLA-NLP, our mission is to develop fair, accountable, robust natural language processing technology to benefit everyone. We will present papers at ACL 2020 on the following topics.


Fairness in Natural Language Processing

Natural Language Processing (NLP) models are widely used in our daily lives. Despite these methods achieve high performance in various applications, they run the risk of exploiting and reinforcing the societal biases (e.g. gender bias) that are present in the underlying data. At ACL, we present our studies on 1) how gender bias is propagated in cross-lingual transfer, 2) how bias is amplified in the distribution of model predictions, and 3) gender bias in relation extraction.

[1], [2], [3]
  • Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

    Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini, Kai-Wei Chang, and Ahmed Hassan Awadallah, in ACL, 2020.
    QA Sessions: 6A Ethics, 10B Ethics Paper link in the virtual conference
    Full Text Slides BibTeX Details
    Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks.
    @inproceedings{zhao2020gender,
      author = {Zhao, Jieyu and Mukherjee, Subhabrata and Hosseini, Saghar and Chang, Kai-Wei and Awadallah, Ahmed Hassan},
      title = {Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer},
      booktitle = {ACL},
      year = {2020},
      presentation_id = {https://virtual.acl2020.org/paper_main.260.html}
    }
    

    Related Publications

    • Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

      Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini, Kai-Wei Chang, and Ahmed Hassan Awadallah, in ACL, 2020.
      Full Text Slides Video Abstract BibTeX Details
      Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks.
      @inproceedings{zhao2020gender,
        author = {Zhao, Jieyu and Mukherjee, Subhabrata and Hosseini, Saghar and Chang, Kai-Wei and Awadallah, Ahmed Hassan},
        title = {Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer},
        booktitle = {ACL},
        year = {2020},
        presentation_id = {https://virtual.acl2020.org/paper_main.260.html}
      }
      
      Details
    • Examining Gender Bias in Languages with Grammatical Gender

      Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-Hao Huang, Muhao Chen, Ryan Cotterell, and Kai-Wei Chang, in EMNLP, 2019.
      Full Text Poster Code Abstract BibTeX Details
      Recent studies have shown that word embeddings exhibit gender bias inherited from the training corpora. However, most studies to date have focused on quantifying and mitigating such bias only in English. These analyses cannot be directly extended to languages that exhibit morphological agreement on gender, such as Spanish and French. In this paper, we propose new metrics for evaluating gender bias in word embeddings of these languages and further demonstrate evidence of gender bias in bilingual embeddings which align these languages with English. Finally, we extend an existing approach to mitigate gender bias in word embeddings under both monolingual and bilingual settings. Experiments on modified Word Embedding Association Test, word similarity, word translation, and word pair translation tasks show that the proposed approaches effectively reduce the gender bias while preserving the utility of the embeddings.
      @inproceedings{zhou2019examining,
        author = {Zhou, Pei and Shi, Weijia and Zhao, Jieyu and Huang, Kuan-Hao and Chen, Muhao and Cotterell, Ryan and Chang, Kai-Wei},
        title = {Examining Gender Bias in Languages with Grammatical Gender},
        booktitle = {EMNLP},
        year = {2019}
      }
      
      Details
    • Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations

      Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, and Vicente Ordonez, in ICCV, 2019.
      Full Text Code Demo Abstract BibTeX Details
      In this work, we present a framework to measure and mitigate intrinsic biases with respect to protected variables –such as gender– in visual recognition tasks. We show that trained models significantly amplify the association of target labels with gender beyond what one would expect from biased datasets. Surprisingly, we show that even when datasets are balanced such that each label co-occurs equally with each gender, learned models amplify the association between labels and gender, as much as if data had not been balanced! To mitigate this, we adopt an adversarial approach to remove unwanted features corresponding to protected variables from intermediate representations in a deep neural network – and provide a detailed analysis of its effectiveness. Experiments on two datasets: the COCO dataset (objects), and the imSitu dataset (actions), show reductions in gender bias amplification while maintaining most of the accuracy of the original models.
      @inproceedings{wang2019balanced,
        author = {Wang, Tianlu and Zhao, Jieyu and Yatskar, Mark and Chang, Kai-Wei and Ordonez, Vicente},
        title = {Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations},
        booktitle = {ICCV},
        year = {2019}
      }
      
      Details
    • Gender Bias in Contextualized Word Embeddings

      Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang, in NAACL (short), 2019.
      Full Text Slides Video Abstract BibTeX Details
      Despite the great success of contextualized word embeddings on downstream applications, these representations potentially embed the societal biases exhibited in their training corpus. In this paper, we quantify, analyze and mitigate the gender bias exhibited in ELMo contextualized word vectors. We first demonstrate that the vectors encode and propagate information about genders unequally and then conduct a principal component analysis to visualize the geometry of the gender information in the embeddings. Then we show that ELMo works unequally well for men and women in down-stream tasks. Finally, we explore a variety of methods to remove such gender bias and demonstrate that it can be reduced through data augmentation.
      @inproceedings{zhao2019gender,
        author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Cotterell, Ryan and Ordonez, Vicente and Chang, Kai-Wei},
        title = {Gender Bias in Contextualized Word Embeddings},
        booktitle = {NAACL (short)},
        year = {2019}
      }
      
      Details
    • Learning Gender-Neutral Word Embeddings

      Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang, in EMNLP (short), 2018.
      Full Text Code Abstract BibTeX Details
      Word embeddings have become a fundamental component in a wide range of Natu-ral Language Processing (NLP) applications.However, these word embeddings trained onhuman-generated corpora inherit strong gen-der stereotypes that reflect social constructs.In this paper, we propose a novel word em-bedding model, De-GloVe, that preserves gen-der information in certain dimensions of wordvectors while compelling other dimensions tobe free of gender influence. Quantitative andqualitative experiments demonstrate that De-GloVe successfully isolates gender informa-tion without sacrificing the functionality of theembedding model.
      @inproceedings{zhao2018learning,
        author = {Zhao, Jieyu and Zhou, Yichao and Li, Zeyu and Wang, Wei and Chang, Kai-Wei},
        title = {Learning Gender-Neutral Word Embeddings},
        booktitle = {EMNLP (short)},
        year = {2018}
      }
      
      Details
    • Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

      Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai, in NeurIPS, 2016.
      Full Text Code Abstract BibTeX Details reported by NPR and MIT Tech Review
      The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.
      @inproceedings{bolukbasi2016man,
        author = {Bolukbasi, Tolga and Chang, Kai-Wei and Zou, James and Saligrama, Venkatesh and Kalai, Adam},
        title = {Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings},
        booktitle = {NeurIPS},
        year = {2016}
      }
      
      Details

    Details
  • Towards Understanding Gender Bias in Relation Extraction

    Andrew Gaut, Tony Sun, Shirlyn Tang, Yuxin Huang, Jing Qian, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang, in ACL, 2020.
    QA Sessions: 6A Ethics, 7B Ethics Paper link in the virtual conference
    Full Text BibTeX Details
    Recent developments in Neural Relation Extraction (NRE) have made significant strides towards automated knowledge base construction. While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to evaluate social biases exhibited in NRE systems. In this paper, we create WikiGenderBias, a distantly supervised dataset composed of over 45,000 sentences including a 10% human annotated test set for the purpose of analyzing gender bias in relation extraction systems. We find that when extracting spouse and hypernym (i.e., occupation) relations, an NRE system performs differently when the gender of the target entity is different. However, such disparity does not appear when extracting relations such as birth date or birth place. We also analyze two existing bias mitigation techniques, word embedding debiasing and data augmentation. Unfortunately, due to NRE models relying heavily on surface level cues, we find that existing bias mitigation approaches have a negative effect on NRE. Our analysis lays groundwork for future quantifying and mitigating bias in relation extraction.
    @inproceedings{gaut2020towards,
      author = {Gaut, Andrew and Sun, Tony and Tang, Shirlyn and Huang, Yuxin and Qian, Jing and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Belding, Elizabeth and Chang, Kai-Wei and Wang, William Yang},
      title = {Towards Understanding Gender Bias in Relation Extraction},
      booktitle = {ACL},
      year = {2020},
      presentation_id = {https://virtual.acl2020.org/paper_main.265.html}
    }
    

    Related Publications

    • Societal Biases in Language Generation: Progress and Challenges

      Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng, in ACL, 2021.
      Full Text Abstract BibTeX Details
      Technology for language generation has advanced rapidly, spurred by advancements in pre-training large models on massive amounts of data and the need for intelligent agents to communicate in a natural manner. While techniques can effectively generate fluent text, they can also produce undesirable societal biases that can have a disproportionately negative impact on marginalized populations. Language generation presents unique challenges for biases in terms of direct user interaction and the structure of decoding techniques. To better understand these challenges, we present a survey on societal biases in language generation, focusing on how data and techniques contribute to biases and progress towards reducing biases. Motivated by a lack of studies on biases from decoding techniques, we also conduct experiments to quantify the effects of these techniques. By further discussing general trends and open challenges, we call to attention promising directions for research and the importance of fairness and inclusivity considerations for language generation applications.
      @inproceedings{sheng2021defense,
        title = {Societal Biases in Language Generation: Progress and Challenges},
        author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Prem and Peng, Nanyun},
        booktitle = {ACL},
        year = {2021}
      }
      
      Details
    • Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification

      Yada Pruksachatkun, Satyapriya Krishna, Jwala Dhamala, Rahul Gupta, and Kai-Wei Chang, in ACL-Finding, 2021.
      Full Text Abstract BibTeX Details
      Existing bias mitigation methods to reduce disparities in model outcomes across cohorts have focused on data augmentation, debiasing model embeddings, or adding fairness-based optimization objectives during training. Separately, certified word substitution robustness methods have been developed to decrease the impact of spurious features and synonym substitutions on model predictions. While their end goals are different, they both aim to encourage models to make the same prediction for certain changes in the input. In this paper, we investigate the utility of certified word substitution robustness methods to improve equality of odds and equality of opportunity on multiple text classification tasks. We observe that certified robustness methods improve fairness, and using both robustness and bias mitigation methods in training results in an improvement in both fronts.
      @inproceedings{pruksachatkun2021robustness,
        title = {Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification},
        author = {Pruksachatkun, Yada and Krishna, Satyapriya and Dhamala, Jwala and Gupta, Rahul and Chang, Kai-Wei},
        booktitle = {ACL-Finding},
        year = {2021}
      }
      
      Details
    • "Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses

      Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng, in NAACL, 2021.
      Full Text Video Code Abstract BibTeX Details
      Ad hominem attacks are those that target some feature of a person’s character instead of the position the person is maintaining. These attacks are harmful because they propagate implicit biases and diminish a person’s credibility. Since dialogue systems respond directly to user input, it is important to study ad hominems in dialogue responses. To this end, we propose categories of ad hominems, compose an annotated dataset, and build a classifier to analyze human and dialogue system responses to English Twitter posts. We specifically compare responses to Twitter topics about marginalized communities (#BlackLivesMatter, #MeToo) versus other topics (#Vegan, #WFH), because the abusive language of ad hominems could further amplify the skew of power away from marginalized populations. Furthermore, we propose a constrained decoding technique that uses salient n-gram similarity as a soft constraint for top-k sampling to reduce the amount of ad hominems generated. Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can use constrained decoding techniques to reduce ad hominems in generated dialogue responses.
      @inproceedings{sheng2021nice,
        title = {"Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses},
        booktitle = {NAACL},
        author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Prem and Peng, Nanyun},
        presentation_id = {https://underline.io/events/122/sessions/4137/lecture/19854-%27nice-try,-kiddo%27-investigating-ad-hominems-in-dialogue-responses},
        year = {2021}
      }
      
      Details
    • BOLD: Dataset and metrics for measuring biases in open-ended language generation

      Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta, in FAccT, 2021.
      Full Text Code Abstract BibTeX Details
      Recent advances in deep learning techniques have enabled machines to generate cohesive open-ended text when prompted with a sequence of words as context. While these models now empower many downstream applications from conversation bots to automatic storytelling, they have been shown to generate texts that exhibit social biases. To systematically study and benchmark social biases in open-ended language generation, we introduce the Bias in Open-Ended Language Generation Dataset (BOLD), a large-scale dataset that consists of 23,679 English text generation prompts for bias benchmarking across five domains: profession, gender, race, religion, and political ideology. We also propose new automated metrics for toxicity, psycholinguistic norms, and text gender polarity to measure social biases in open-ended text generation from multiple angles. An examination of text generated from three popular language models reveals that the majority of these models exhibit a larger social bias than human-written Wikipedia text across all domains. With these results we highlight the need to benchmark biases in open-ended language generation and caution users of language generation models on downstream tasks to be cognizant of these embedded prejudices.
      @inproceedings{dhamala2021bold,
        author = {Dhamala, Jwala and Sun, Tony and Kumar, Varun and Krishna, Satyapriya and Pruksachatkun, Yada and Chang, Kai-Wei and Gupta, Rahul},
        title = {BOLD: Dataset and metrics for measuring biases in open-ended language generation},
        booktitle = {FAccT},
        year = {2021}
      }
      
      Details
    • LOGAN: Local Group Bias Detection by Clustering

      Jieyu Zhao and Kai-Wei Chang, in EMNLP (short), 2020.
      Full Text Code Abstract BibTeX Details
      Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.
      @inproceedings{zhao2020logan,
        author = {Zhao, Jieyu and Chang, Kai-Wei},
        title = {LOGAN: Local Group Bias Detection by Clustering},
        booktitle = {EMNLP (short)},
        presentation_id = {https://virtual.2020.emnlp.org/paper_main.2886.html},
        year = {2020}
      }
      
      Details
    • Towards Controllable Biases in Language Generation

      Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP-Finding, 2020.
      Full Text Code Abstract BibTeX Details
      We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We then analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics. The former scenario enables us to detect the types of biases present in the model. Specifically, we show the effectiveness of our approach at facilitating bias analysis by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics. The second scenario is useful for mitigating biases in downstream applications such as dialogue generation. In our experiments, the mitigation technique proves to be effective at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.
      @inproceedings{sheng2020towards,
        title = {Towards Controllable Biases in Language Generation},
        author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
        booktitle = {EMNLP-Finding},
        year = {2020}
      }
      
      Details
    • Towards Understanding Gender Bias in Relation Extraction

      Andrew Gaut, Tony Sun, Shirlyn Tang, Yuxin Huang, Jing Qian, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang, in ACL, 2020.
      Full Text Abstract BibTeX Details
      Recent developments in Neural Relation Extraction (NRE) have made significant strides towards automated knowledge base construction. While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to evaluate social biases exhibited in NRE systems. In this paper, we create WikiGenderBias, a distantly supervised dataset composed of over 45,000 sentences including a 10% human annotated test set for the purpose of analyzing gender bias in relation extraction systems. We find that when extracting spouse and hypernym (i.e., occupation) relations, an NRE system performs differently when the gender of the target entity is different. However, such disparity does not appear when extracting relations such as birth date or birth place. We also analyze two existing bias mitigation techniques, word embedding debiasing and data augmentation. Unfortunately, due to NRE models relying heavily on surface level cues, we find that existing bias mitigation approaches have a negative effect on NRE. Our analysis lays groundwork for future quantifying and mitigating bias in relation extraction.
      @inproceedings{gaut2020towards,
        author = {Gaut, Andrew and Sun, Tony and Tang, Shirlyn and Huang, Yuxin and Qian, Jing and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Belding, Elizabeth and Chang, Kai-Wei and Wang, William Yang},
        title = {Towards Understanding Gender Bias in Relation Extraction},
        booktitle = {ACL},
        year = {2020},
        presentation_id = {https://virtual.acl2020.org/paper_main.265.html}
      }
      
      Details
    • Mitigating Gender Bias Amplification in Distribution by Posterior Regularization

      Shengyu Jia, Tao Meng, Jieyu Zhao, and Kai-Wei Chang, in ACL (short), 2020.
      Full Text Slides Video Code Abstract BibTeX Details
      Advanced machine  learning  techniques  have boosted  the  performance  of  natural  language processing.  Nevertheless, recent studies, e.g., Zhao et al. (2017) show that these techniques inadvertently capture the societal bias hiddenin the corpus and further amplify it.  However,their analysis is conducted only on models’ top predictions.   In this paper,  we investigate thegender  bias  amplification  issue  from  the  distribution perspective and demonstrate that thebias is amplified in the view of predicted probability distribution over labels. We further propose a bias mitigation approach based on posterior regularization.   With little performance loss,  our method can almost remove the bias amplification  in  the  distribution. Our study sheds the light on understanding the bias amplification.
      @inproceedings{jia2020mitigating,
        author = {Jia, Shengyu and Meng, Tao and Zhao, Jieyu and Chang, Kai-Wei},
        title = {Mitigating Gender Bias Amplification in Distribution by Posterior Regularization},
        booktitle = {ACL (short)},
        year = {2020},
        presentation_id = {https://virtual.acl2020.org/paper_main.264.html}
      }
      
      Details
    • The Woman Worked as a Babysitter: On Biases in Language Generation

      Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP (short), 2019.
      Full Text Slides Video Code Abstract BibTeX Details
      We present a systematic study of biases in natural language generation (NLG) by analyzing text generated from prompts that contain mentions of different demographic groups. In this work, we introduce the notion of the regard towards a demographic, use the varying levels of regard towards different demographics as a defining metric for bias in NLG, and analyze the extent to which sentiment scores are a relevant proxy metric for regard. To this end, we collect strategically-generated text from language models and manually annotate the text with both sentiment and regard scores. Additionally, we build an automatic regard classifier through transfer learning, so that we can analyze biases in unseen text. Together, these methods reveal the extent of the biased nature of language model generations. Our analysis provides a study of biases in NLG, bias metrics and correlated human judgments, and empirical evidence on the usefulness of our annotated dataset.
      @inproceedings{sheng2019woman,
        author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
        title = {The Woman Worked as a Babysitter: On Biases in Language Generation},
        booktitle = {EMNLP (short)},
        vimeo_id = {426366363},
        year = {2019}
      }
      
      Details
    • Mitigating Gender in Natural Language Processing: Literature Review

      Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Kai-Wei Chang, and William Yang Wang, in ACL, 2019.
      Full Text Slides Video Abstract BibTeX Details
      As Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity, it becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. Although NLP models have shown success in modeling various applications, they propagate and may even amplify gender bias found in text corpora. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent. In this paper, we review contemporary studies on recognizing and mitigating gender bias in NLP. We discuss gender bias based on four forms of representation bias and analyze methods recognizing gender bias. Furthermore, we discuss the advantages and drawbacks of existing gender debiasing methods. Finally, we discuss future studies for recognizing and mitigating gender bias in NLP.
      @inproceedings{sun2019mitigating,
        author = {Sun, Tony and Gaut, Andrew and Tang, Shirlyn and Huang, Yuxin and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Chang, Kai-Wei and Wang, William Yang},
        title = {Mitigating Gender in Natural Language Processing: Literature Review},
        booktitle = {ACL},
        vimeo_id = {384482151},
        year = {2019}
      }
      
      Details
    • Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

      Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang, in NAACL (short), 2018.
      Full Text Poster Code Abstract BibTeX Details
      In this paper, we introduce a new benchmark for co-reference resolution focused on gender bias, WinoBias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing datasets.
      @inproceedings{zhao2018gender,
        author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei},
        title = {Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods},
        booktitle = {NAACL (short)},
        press_url = {https://www.stitcher.com/podcast/matt-gardner/nlp-highlights/e/55861936},
        year = {2018}
      }
      
      Details
    • Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

      Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang, in EMNLP, 2017.
      Full Text Slides Code Abstract BibTeX Details EMNLP 2017 Best Long Paper Award
      Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occuring labels and visual input but risk inadvertently encoding social biases found in web corpora.
      In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, but a trained model amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for the resulting inference problems. Our method results in no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 33.3% and 44.9% for multilabel classification and visual semantic role labeling, respectively.
      @inproceedings{zhao2017men,
        author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei},
        title = {Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints},
        booktitle = {EMNLP},
        year = {2017}
      }
      
      Details

    Details
  • Mitigating Gender Bias Amplification in Distribution by Posterior Regularization

    Shengyu Jia, Tao Meng, Jieyu Zhao, and Kai-Wei Chang, in ACL (short), 2020.
    QA Sessions: 6A Ethics, 10B Ethics Paper link in the virtual conference
    Full Text Slides Code BibTeX Details
    Advanced machine  learning  techniques  have boosted  the  performance  of  natural  language processing.  Nevertheless, recent studies, e.g., Zhao et al. (2017) show that these techniques inadvertently capture the societal bias hiddenin the corpus and further amplify it.  However,their analysis is conducted only on models’ top predictions.   In this paper,  we investigate thegender  bias  amplification  issue  from  the  distribution perspective and demonstrate that thebias is amplified in the view of predicted probability distribution over labels. We further propose a bias mitigation approach based on posterior regularization.   With little performance loss,  our method can almost remove the bias amplification  in  the  distribution. Our study sheds the light on understanding the bias amplification.
    @inproceedings{jia2020mitigating,
      author = {Jia, Shengyu and Meng, Tao and Zhao, Jieyu and Chang, Kai-Wei},
      title = {Mitigating Gender Bias Amplification in Distribution by Posterior Regularization},
      booktitle = {ACL (short)},
      year = {2020},
      presentation_id = {https://virtual.acl2020.org/paper_main.264.html}
    }
    

    Related Publications

    • Societal Biases in Language Generation: Progress and Challenges

      Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng, in ACL, 2021.
      Full Text Abstract BibTeX Details
      Technology for language generation has advanced rapidly, spurred by advancements in pre-training large models on massive amounts of data and the need for intelligent agents to communicate in a natural manner. While techniques can effectively generate fluent text, they can also produce undesirable societal biases that can have a disproportionately negative impact on marginalized populations. Language generation presents unique challenges for biases in terms of direct user interaction and the structure of decoding techniques. To better understand these challenges, we present a survey on societal biases in language generation, focusing on how data and techniques contribute to biases and progress towards reducing biases. Motivated by a lack of studies on biases from decoding techniques, we also conduct experiments to quantify the effects of these techniques. By further discussing general trends and open challenges, we call to attention promising directions for research and the importance of fairness and inclusivity considerations for language generation applications.
      @inproceedings{sheng2021defense,
        title = {Societal Biases in Language Generation: Progress and Challenges},
        author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Prem and Peng, Nanyun},
        booktitle = {ACL},
        year = {2021}
      }
      
      Details
    • Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification

      Yada Pruksachatkun, Satyapriya Krishna, Jwala Dhamala, Rahul Gupta, and Kai-Wei Chang, in ACL-Finding, 2021.
      Full Text Abstract BibTeX Details
      Existing bias mitigation methods to reduce disparities in model outcomes across cohorts have focused on data augmentation, debiasing model embeddings, or adding fairness-based optimization objectives during training. Separately, certified word substitution robustness methods have been developed to decrease the impact of spurious features and synonym substitutions on model predictions. While their end goals are different, they both aim to encourage models to make the same prediction for certain changes in the input. In this paper, we investigate the utility of certified word substitution robustness methods to improve equality of odds and equality of opportunity on multiple text classification tasks. We observe that certified robustness methods improve fairness, and using both robustness and bias mitigation methods in training results in an improvement in both fronts.
      @inproceedings{pruksachatkun2021robustness,
        title = {Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification},
        author = {Pruksachatkun, Yada and Krishna, Satyapriya and Dhamala, Jwala and Gupta, Rahul and Chang, Kai-Wei},
        booktitle = {ACL-Finding},
        year = {2021}
      }
      
      Details
    • "Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses

      Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng, in NAACL, 2021.
      Full Text Video Code Abstract BibTeX Details
      Ad hominem attacks are those that target some feature of a person’s character instead of the position the person is maintaining. These attacks are harmful because they propagate implicit biases and diminish a person’s credibility. Since dialogue systems respond directly to user input, it is important to study ad hominems in dialogue responses. To this end, we propose categories of ad hominems, compose an annotated dataset, and build a classifier to analyze human and dialogue system responses to English Twitter posts. We specifically compare responses to Twitter topics about marginalized communities (#BlackLivesMatter, #MeToo) versus other topics (#Vegan, #WFH), because the abusive language of ad hominems could further amplify the skew of power away from marginalized populations. Furthermore, we propose a constrained decoding technique that uses salient n-gram similarity as a soft constraint for top-k sampling to reduce the amount of ad hominems generated. Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can use constrained decoding techniques to reduce ad hominems in generated dialogue responses.
      @inproceedings{sheng2021nice,
        title = {"Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses},
        booktitle = {NAACL},
        author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Prem and Peng, Nanyun},
        presentation_id = {https://underline.io/events/122/sessions/4137/lecture/19854-%27nice-try,-kiddo%27-investigating-ad-hominems-in-dialogue-responses},
        year = {2021}
      }
      
      Details
    • BOLD: Dataset and metrics for measuring biases in open-ended language generation

      Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta, in FAccT, 2021.
      Full Text Code Abstract BibTeX Details
      Recent advances in deep learning techniques have enabled machines to generate cohesive open-ended text when prompted with a sequence of words as context. While these models now empower many downstream applications from conversation bots to automatic storytelling, they have been shown to generate texts that exhibit social biases. To systematically study and benchmark social biases in open-ended language generation, we introduce the Bias in Open-Ended Language Generation Dataset (BOLD), a large-scale dataset that consists of 23,679 English text generation prompts for bias benchmarking across five domains: profession, gender, race, religion, and political ideology. We also propose new automated metrics for toxicity, psycholinguistic norms, and text gender polarity to measure social biases in open-ended text generation from multiple angles. An examination of text generated from three popular language models reveals that the majority of these models exhibit a larger social bias than human-written Wikipedia text across all domains. With these results we highlight the need to benchmark biases in open-ended language generation and caution users of language generation models on downstream tasks to be cognizant of these embedded prejudices.
      @inproceedings{dhamala2021bold,
        author = {Dhamala, Jwala and Sun, Tony and Kumar, Varun and Krishna, Satyapriya and Pruksachatkun, Yada and Chang, Kai-Wei and Gupta, Rahul},
        title = {BOLD: Dataset and metrics for measuring biases in open-ended language generation},
        booktitle = {FAccT},
        year = {2021}
      }
      
      Details
    • LOGAN: Local Group Bias Detection by Clustering

      Jieyu Zhao and Kai-Wei Chang, in EMNLP (short), 2020.
      Full Text Code Abstract BibTeX Details
      Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.
      @inproceedings{zhao2020logan,
        author = {Zhao, Jieyu and Chang, Kai-Wei},
        title = {LOGAN: Local Group Bias Detection by Clustering},
        booktitle = {EMNLP (short)},
        presentation_id = {https://virtual.2020.emnlp.org/paper_main.2886.html},
        year = {2020}
      }
      
      Details
    • Towards Controllable Biases in Language Generation

      Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP-Finding, 2020.
      Full Text Code Abstract BibTeX Details
      We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We then analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics. The former scenario enables us to detect the types of biases present in the model. Specifically, we show the effectiveness of our approach at facilitating bias analysis by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics. The second scenario is useful for mitigating biases in downstream applications such as dialogue generation. In our experiments, the mitigation technique proves to be effective at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.
      @inproceedings{sheng2020towards,
        title = {Towards Controllable Biases in Language Generation},
        author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
        booktitle = {EMNLP-Finding},
        year = {2020}
      }
      
      Details
    • Towards Understanding Gender Bias in Relation Extraction

      Andrew Gaut, Tony Sun, Shirlyn Tang, Yuxin Huang, Jing Qian, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang, in ACL, 2020.
      Full Text Abstract BibTeX Details
      Recent developments in Neural Relation Extraction (NRE) have made significant strides towards automated knowledge base construction. While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to evaluate social biases exhibited in NRE systems. In this paper, we create WikiGenderBias, a distantly supervised dataset composed of over 45,000 sentences including a 10% human annotated test set for the purpose of analyzing gender bias in relation extraction systems. We find that when extracting spouse and hypernym (i.e., occupation) relations, an NRE system performs differently when the gender of the target entity is different. However, such disparity does not appear when extracting relations such as birth date or birth place. We also analyze two existing bias mitigation techniques, word embedding debiasing and data augmentation. Unfortunately, due to NRE models relying heavily on surface level cues, we find that existing bias mitigation approaches have a negative effect on NRE. Our analysis lays groundwork for future quantifying and mitigating bias in relation extraction.
      @inproceedings{gaut2020towards,
        author = {Gaut, Andrew and Sun, Tony and Tang, Shirlyn and Huang, Yuxin and Qian, Jing and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Belding, Elizabeth and Chang, Kai-Wei and Wang, William Yang},
        title = {Towards Understanding Gender Bias in Relation Extraction},
        booktitle = {ACL},
        year = {2020},
        presentation_id = {https://virtual.acl2020.org/paper_main.265.html}
      }
      
      Details
    • Mitigating Gender Bias Amplification in Distribution by Posterior Regularization

      Shengyu Jia, Tao Meng, Jieyu Zhao, and Kai-Wei Chang, in ACL (short), 2020.
      Full Text Slides Video Code Abstract BibTeX Details
      Advanced machine  learning  techniques  have boosted  the  performance  of  natural  language processing.  Nevertheless, recent studies, e.g., Zhao et al. (2017) show that these techniques inadvertently capture the societal bias hiddenin the corpus and further amplify it.  However,their analysis is conducted only on models’ top predictions.   In this paper,  we investigate thegender  bias  amplification  issue  from  the  distribution perspective and demonstrate that thebias is amplified in the view of predicted probability distribution over labels. We further propose a bias mitigation approach based on posterior regularization.   With little performance loss,  our method can almost remove the bias amplification  in  the  distribution. Our study sheds the light on understanding the bias amplification.
      @inproceedings{jia2020mitigating,
        author = {Jia, Shengyu and Meng, Tao and Zhao, Jieyu and Chang, Kai-Wei},
        title = {Mitigating Gender Bias Amplification in Distribution by Posterior Regularization},
        booktitle = {ACL (short)},
        year = {2020},
        presentation_id = {https://virtual.acl2020.org/paper_main.264.html}
      }
      
      Details
    • The Woman Worked as a Babysitter: On Biases in Language Generation

      Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng, in EMNLP (short), 2019.
      Full Text Slides Video Code Abstract BibTeX Details
      We present a systematic study of biases in natural language generation (NLG) by analyzing text generated from prompts that contain mentions of different demographic groups. In this work, we introduce the notion of the regard towards a demographic, use the varying levels of regard towards different demographics as a defining metric for bias in NLG, and analyze the extent to which sentiment scores are a relevant proxy metric for regard. To this end, we collect strategically-generated text from language models and manually annotate the text with both sentiment and regard scores. Additionally, we build an automatic regard classifier through transfer learning, so that we can analyze biases in unseen text. Together, these methods reveal the extent of the biased nature of language model generations. Our analysis provides a study of biases in NLG, bias metrics and correlated human judgments, and empirical evidence on the usefulness of our annotated dataset.
      @inproceedings{sheng2019woman,
        author = {Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun},
        title = {The Woman Worked as a Babysitter: On Biases in Language Generation},
        booktitle = {EMNLP (short)},
        vimeo_id = {426366363},
        year = {2019}
      }
      
      Details
    • Mitigating Gender in Natural Language Processing: Literature Review

      Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Kai-Wei Chang, and William Yang Wang, in ACL, 2019.
      Full Text Slides Video Abstract BibTeX Details
      As Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity, it becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. Although NLP models have shown success in modeling various applications, they propagate and may even amplify gender bias found in text corpora. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent. In this paper, we review contemporary studies on recognizing and mitigating gender bias in NLP. We discuss gender bias based on four forms of representation bias and analyze methods recognizing gender bias. Furthermore, we discuss the advantages and drawbacks of existing gender debiasing methods. Finally, we discuss future studies for recognizing and mitigating gender bias in NLP.
      @inproceedings{sun2019mitigating,
        author = {Sun, Tony and Gaut, Andrew and Tang, Shirlyn and Huang, Yuxin and ElSherief, Mai and Zhao, Jieyu and Mirza, Diba and Chang, Kai-Wei and Wang, William Yang},
        title = {Mitigating Gender in Natural Language Processing: Literature Review},
        booktitle = {ACL},
        vimeo_id = {384482151},
        year = {2019}
      }
      
      Details
    • Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

      Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang, in NAACL (short), 2018.
      Full Text Poster Code Abstract BibTeX Details
      In this paper, we introduce a new benchmark for co-reference resolution focused on gender bias, WinoBias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing datasets.
      @inproceedings{zhao2018gender,
        author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei},
        title = {Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods},
        booktitle = {NAACL (short)},
        press_url = {https://www.stitcher.com/podcast/matt-gardner/nlp-highlights/e/55861936},
        year = {2018}
      }
      
      Details
    • Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

      Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang, in EMNLP, 2017.
      Full Text Slides Code Abstract BibTeX Details EMNLP 2017 Best Long Paper Award
      Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occuring labels and visual input but risk inadvertently encoding social biases found in web corpora.
      In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, but a trained model amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for the resulting inference problems. Our method results in no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 33.3% and 44.9% for multilabel classification and visual semantic role labeling, respectively.
      @inproceedings{zhao2017men,
        author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei},
        title = {Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints},
        booktitle = {EMNLP},
        year = {2017}
      }
      
      Details

    Details

Analyze and Understand NLP Models

It is essential to analyze and understand the capability of NLP technology. At ACL, we present the following papers on 1) analyzing the robustness of contextualized language encoders against grammatical errors, 2) understanding what are captured by pre-trained visually grounded language models like VisualBERT, and 3) benchmarking transformer-based approaches for source code summarization.

[1], [2], [3]
  • On the Robustness of Language Encoders against Grammatical Errors

    Fan Yin, Quanyu Long, Tao Meng, and Kai-Wei Chang, in ACL, 2020.
    QA Sessions: 6B Interpretability, 8A Interpretability Paper link in the virtual conference
    Full Text Slides Code BibTeX Details
    We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors.
    @inproceedings{yin2020robustness,
      author = {Yin, Fan and Long, Quanyu and Meng, Tao and Chang, Kai-Wei},
      title = {On the Robustness of Language Encoders against Grammatical Errors},
      booktitle = {ACL},
      presentation_id = {https://virtual.acl2020.org/paper_main.310.html},
      year = {2020}
    }
    

    Related Publications

    • Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble

      Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang, in ACL, 2021.
      Full Text Abstract BibTeX Details
      Although deep neural networks have achieved prominent performance on many NLP tasks, they are vulnerable to adversarial examples. We propose Dirichlet Neighborhood Ensemble (DNE), a randomized method for training a robust model to defense synonym substitutionbased attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models (e.g., BERT) for NLP applications. Through extensive experimentation, we demonstrate that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
      @inproceedings{zhou2021defense,
        title = {Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble},
        author = {Zhou, Yi and Zheng, Xiaoqing and Hsieh, Cho-Jui and Chang, Kai-Wei and Huang, Xuanjing},
        booktitle = {ACL},
        year = {2021}
      }
      
      Details
    • Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation

      Chong Zhang, Jieyu Zhao, Huan Zhang, Kai-Wei Chang, and Cho-Jui Hsieh, in NAACL, 2021.
      Full Text Video Code Abstract BibTeX Details
      Robustness and counterfactual bias are usually evaluated on a test dataset. However, are these evaluations robust? If the test dataset is perturbed slightly, will the evaluation results keep the same? In this paper, we propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset. The framework first perturbs the test dataset to construct abundant natural sentences similar to the test data, and then diagnoses the prediction change regarding a single-word substitution. We apply this framework to study two perturbation-based approaches that are used to analyze models’ robustness and counterfactual bias in English. (1) For robustness, we focus on synonym substitutions and identify vulnerable examples where prediction can be altered. Our proposed attack attains high success rates (96.0%-99.8%) in finding vulnerable examples on both original and robustly trained CNNs and Transformers. (2) For counterfactual bias, we focus on substituting demographic tokens (e.g., gender, race) and measure the shift of the expected prediction among constructed sentences. Our method is able to reveal the hidden model biases not directly shown in the test dataset.
      @inproceedings{zhang2021double,
        title = {	Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation},
        booktitle = {NAACL},
        author = {Zhang, Chong and Zhao, Jieyu and Zhang, Huan and Chang, Kai-Wei and Hsieh, Cho-Jui},
        year = {2021},
        presentation_id = {https://underline.io/events/122/sessions/4229/lecture/19609-double-perturbation-on-the-robustness-of-robustness-and-counterfactual-bias-evaluation}
      }
      
      Details
    • Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs

      Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh, in NeurIPS, 2020.
      Full Text Code Abstract BibTeX Details
      Linear relaxation based perturbation analysis (LiRPA) for neural networks, which computes provable linear bounds of output neurons given a certain amount of input perturbation, has become a core component in robustness verification and certified defense. The majority of LiRPA-based methods only consider simple feed-forward networks and it needs particular manual derivations and implementations when extended to other architectures. In this paper, we develop an automatic framework to enable perturbation analysis on any neural network structures, by generalizing exiting LiRPA algorithms such as CROWN to operate on general computational graphs. The flexibility, differentiability and ease of use of our framework allow us to obtain state-of-the-art results on LiRPA based certified defense on fairly complicated networks like DenseNet, ResNeXt and Transformer that are not supported by prior work. Our framework also enables loss fusion, a technique that significantly reduces the computational complexity of LiRPA for certified defense. For the first time, we demonstrate LiRPA based certified defense on Tiny ImageNet and Downscaled ImageNet where previous approaches cannot scale to due to the relatively large number of classes. Our work also yields an open-source library for the community to apply LiRPA to areas beyond certified defense without much LiRPA expertise, e.g., we create a neural network with a provably flat optimization landscape. Our open source library is available at https://github.com/KaidiXu/auto_LiRPA
      @inproceedings{xu2020provable,
        author = {Xu, Kaidi and Shi, Zhouxing and Zhang, Huan and Wang, Yihan and Chang, Kai-Wei and Huang, Minlie and Kailkhura, Bhavya and Lin, Xue and Hsieh, Cho-Jui},
        title = {Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs},
        booktitle = {NeurIPS},
        year = {2020}
      }
      
      Details
    • On the Robustness of Language Encoders against Grammatical Errors

      Fan Yin, Quanyu Long, Tao Meng, and Kai-Wei Chang, in ACL, 2020.
      Full Text Slides Video Code Abstract BibTeX Details
      We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors.
      @inproceedings{yin2020robustness,
        author = {Yin, Fan and Long, Quanyu and Meng, Tao and Chang, Kai-Wei},
        title = {On the Robustness of Language Encoders against Grammatical Errors},
        booktitle = {ACL},
        presentation_id = {https://virtual.acl2020.org/paper_main.310.html},
        year = {2020}
      }
      
      Details
    • Robustness Verification for Transformers

      Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, and Cho-Jui Hsieh, in ICLR, 2020.
      Full Text Video Code Abstract BibTeX Details
      Robustness verification that aims to formally certify the prediction behavior of
      neural networks has become an important tool for understanding the behavior of
      a given model and for obtaining safety guarantees. However, previous methods
      are usually limited to relatively simple neural networks. In this paper, we consider the robustness verification problem for Transformers. Transformers have
      complex self-attention layers that pose many challenges for verification, including
      cross-nonlinearity and cross-position dependency, which have not been discussed
      in previous work. We resolve these challenges and develop the first verification
      algorithm for Transformers. The certified robustness bounds computed by our
      method are significantly tighter than those by naive Interval Bound Propagation.
      These bounds also shed light on interpreting Transformers as they consistently
      reflect the importance of words in sentiment analysis.
      @inproceedings{shi2020robustness,
        author = {Shi, Zhouxing and Zhang, Huan and Chang, Kai-Wei and Huang, Minlie and Hsieh, Cho-Jui},
        title = {Robustness Verification for Transformers},
        booktitle = {ICLR},
        year = {2020}
      }
      
      Details
    • Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification

      Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, and Wei Wang, in EMNLP, 2019.
      Full Text Code Abstract BibTeX Details
      Adversarial attacks against machine learning models have threatened various real-world applications such as spam filtering and sentiment analysis. In this paper, we propose a novel framework, learning to DIScriminate Perturbations (DISP), to identify and adjust malicious perturbations, thereby blocking adversarial attacks for text classification models. To identify adversarial attacks, a perturbation discriminator validates how likely a token in the text is perturbed and provides a set of potential perturbations. For each potential perturbation, an embedding estimator learns to restore the embedding of the original word based on the context and a replacement token is chosen based on approximate kNN search. DISP can block adversarial attacks for any NLP model without modifying the model structure or training procedure. Extensive experiments on two benchmark datasets demonstrate that DISP significantly outperforms baseline methods in blocking adversarial attacks for text classification. In addition, in-depth analysis shows the robustness of DISP across different situations.
      @inproceedings{zhou2019learning,
        author = {Zhou, Yichao and Jiang, Jyun-Yu and Chang, Kai-Wei and Wang, Wei},
        title = {Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification},
        booktitle = {EMNLP},
        year = {2019}
      }
      
      Details
    • Retrofitting Contextualized Word Embeddings with Paraphrases

      Weijia Shi, Muhao Chen, Pei Zhou, and Kai-Wei Chang, in EMNLP (short), 2019.
      Full Text Slides Video Code Abstract BibTeX Details
      Contextualized word embedding models, such as ELMo, generate meaningful representations of words and their context. These models have been shown to have a great impact on downstream applications. However, in many cases, the contextualized embedding of a word changes drastically when the context is paraphrased. As a result, the downstream model is not robust to paraphrasing and other linguistic variations. To enhance the stability of contextualized word embedding models, we propose an approach to retrofitting contextualized embedding models with paraphrase contexts. Our method learns an orthogonal transformation on the input space, which seeks to minimize the variance of word representations on paraphrased contexts. Experiments show that the retrofitted model significantly outperforms the original ELMo on various sentence classification and language inference tasks.
      @inproceedings{shi2019retrofitting,
        author = {Shi, Weijia and Chen, Muhao and Zhou, Pei and Chang, Kai-Wei},
        title = {Retrofitting Contextualized Word Embeddings with Paraphrases},
        booktitle = {EMNLP (short)},
        vimeo_id = {430797636},
        year = {2019}
      }
      
      Details
    • Generating Natural Language Adversarial Examples

      Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang, in EMNLP (short), 2018.
      Full Text Code Abstract BibTeX Details
      Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the network to misclassify. In the image domain, these perturbations can often be made virtually indistinguishable to human perception, causing humans and state-of-the-art models to disagree. However, in the natural language domain, small perturbations are clearly perceptible, and the replacement of a single word can drastically alter the semantics of the document. Given these challenges, we use a population-based optimization algorithm to generate semantically and syntactically similar adversarial examples. We demonstrate via a human study that 94.3% of the generated examples are classified to the original label by human evaluators, and that the examples are perceptibly quite similar. We hope our findings encourage researchers to pursue improving the robustness of DNNs in the natural language domain.
      @inproceedings{alzanto2018generating,
        author = {Alzantot, Moustafa and Sharma, Yash and Elgohary, Ahmed and Ho, Bo-Jhang and Srivastava, Mani and Chang, Kai-Wei},
        title = {Generating Natural Language Adversarial Examples},
        booktitle = {EMNLP (short)},
        year = {2018}
      }
      
      Details

    Details
  • What Does BERT with Vision Look At?

    Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in ACL (short), 2020.
    QA Sessions: 9A THEME-1, 10A THEME-2 Paper link in the virtual conference
    Full Text Slides Code BibTeX Details
    Pre-trained visually grounded language models such as ViLBERT, LXMERT, and UNITER have achieved significant performance improvement on vision-and-language tasks but what they learn during pre-training remains unclear. In this work, we demonstrate that certain attention heads of a visually grounded language model actively ground elements of language to image regions. Specifically, some heads can map entities to image regions, performing the task known as entity grounding. Some heads can even detect the syntactic relations between non-entity words and image regions, tracking, for example, associations between verbs and regions corresponding to their arguments. We denote this ability as \emphsyntactic grounding. We verify grounding both quantitatively and qualitatively, using Flickr30K Entities as a testbed.
    @inproceedings{li2020what,
      author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
      title = {What Does BERT with Vision Look At?},
      booktitle = {ACL (short)},
      presentation_id = {https://virtual.acl2020.org/paper_main.469.html},
      year = {2020}
    }
    
    See the full version of this paper.

    Related Publications

    • Unified Pre-training for Program Understanding and Generation

      Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in NAACL, 2021.
      Full Text Video Code Abstract BibTeX Details
      Code summarization nd generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART’s effectiveness in program understanding. Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow (e.g., if block inside an else block is equivalent to else if block) that are crucial to program semantics and thus excels even with limited annotations.
      @inproceedings{ahmad2021unified,
        title = {Unified Pre-training for Program Understanding and Generation},
        author = {Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
        booktitle = {NAACL},
        presentation_id = {https://underline.io/events/122/sessions/4197/lecture/20024-unified-pre-training-for-program-understanding-and-generation},
        year = {2021}
      }
      
      Details
    • Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions

      Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, and Kai-Wei Chang, in NAACL, 2021.
      Full Text Video Abstract BibTeX Details
      Pre-trained contextual vision-and-language (V&L) models have brought impressive performance improvement on various benchmarks. However, the paired text-image data required for pre-training are hard to collect and scale up. We investigate if a strong V&L representation model can be learned without text-image pairs. We propose Weakly-supervised VisualBERT with the key idea of conducting "mask-and-predict" pre-training on language-only and image-only corpora. Additionally, we introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. Evaluation on four V&L benchmarks shows that Weakly-supervised VisualBERT achieves similar performance with a model pre-trained with paired data. Besides, pre-training on more image-only data further improves a model that already has access to aligned data, suggesting the possibility of utilizing billions of raw images available to enhance V&L models.
      @inproceedings{li2021unsupervised,
        author = {Li, Liunian Harold and You, Haoxuan and Wang, Zhecan and Zareian, Alireza and Chang, Shih-Fu and Chang, Kai-Wei},
        title = {Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions},
        booktitle = {NAACL},
        presentation_id = {https://underline.io/events/122/sessions/4269/lecture/19725-unsupervised-vision-and-language-pre-training-without-parallel-images-and-captions},
        year = {2021}
      }
      
      Details
    • What Does BERT with Vision Look At?

      Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in ACL (short), 2020.
      Full Text Slides Video Code Abstract BibTeX Details
      Pre-trained visually grounded language models such as ViLBERT, LXMERT, and UNITER have achieved significant performance improvement on vision-and-language tasks but what they learn during pre-training remains unclear. In this work, we demonstrate that certain attention heads of a visually grounded language model actively ground elements of language to image regions. Specifically, some heads can map entities to image regions, performing the task known as entity grounding. Some heads can even detect the syntactic relations between non-entity words and image regions, tracking, for example, associations between verbs and regions corresponding to their arguments. We denote this ability as \emphsyntactic grounding. We verify grounding both quantitatively and qualitatively, using Flickr30K Entities as a testbed.
      @inproceedings{li2020what,
        author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
        title = {What Does BERT with Vision Look At?},
        booktitle = {ACL (short)},
        presentation_id = {https://virtual.acl2020.org/paper_main.469.html},
        year = {2020}
      }
      
      Details
    • VisualBERT: A Simple and Performant Baseline for Vision and Language

      Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in Arxiv, 2019.
      Full Text Code Abstract BibTeX Details
      We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.
      @inproceedings{li2019visualbert,
        author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
        title = {VisualBERT: A Simple and Performant Baseline for Vision and Language},
        booktitle = {Arxiv},
        year = {2019}
      }
      
      Details

    Details
  • A Transformer-based Approach for Source Code Summarization

    Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in ACL (short), 2020.
    QA Sessions: 9A Summarization, 10B Summarization Paper link in the virtual conference
    Full Text Slides Code BibTeX Details
    Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range dependencies is crucial. To learn code representation for summarization, we explore the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies. In this work, we show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin. We perform extensive analysis and ablation studies that reveal several important findings, e.g., the absolute encoding of source code tokens’ position hinders, while relative encoding significantly improves the summarization performance. We have made our code publicly available to facilitate future research.
    @inproceedings{ahmad2020transformer,
      author = {Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
      title = {A Transformer-based Approach for Source Code Summarization},
      booktitle = {ACL (short)},
      year = {2020},
      presentation_id = {https://virtual.acl2020.org/paper_main.449.html}
    }
    

    Related Publications

    • A Transformer-based Approach for Source Code Summarization

      Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in ACL (short), 2020.
      Full Text Slides Video Code Abstract BibTeX Details
      Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range dependencies is crucial. To learn code representation for summarization, we explore the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies. In this work, we show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin. We perform extensive analysis and ablation studies that reveal several important findings, e.g., the absolute encoding of source code tokens’ position hinders, while relative encoding significantly improves the summarization performance. We have made our code publicly available to facilitate future research.
      @inproceedings{ahmad2020transformer,
        author = {Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
        title = {A Transformer-based Approach for Source Code Summarization},
        booktitle = {ACL (short)},
        year = {2020},
        presentation_id = {https://virtual.acl2020.org/paper_main.449.html}
      }
      
      Details
    • Context Attentive Document Ranking and Query Suggestion

      Wasi Ahmad, Kai-Wei Chang, and Hongning Wang, in SIGIR, 2019.
      Full Text Slides Code Abstract BibTeX Details
      We present a context-aware neural ranking model to exploit users’ on-task search activities and enhance retrieval performance. Inparticular, a two-level hierarchical recurrent neural network isintroduced to learn search context representation of individualqueries, search tasks, and corresponding dependency structure byjointly optimizing two companion retrieval tasks: document rank-ing and query suggestion. To identify variable dependency structurebetween search context and users’ ongoing search activities, at-tention at both levels of recurrent states are introduced. Extensiveexperiment comparisons against a rich set of baseline methods andan in-depth ablation analysis confirm the value of our proposedapproach for modeling search context buried in search tasks.
      @inproceedings{ahmad2019context,
        author = {Ahmad, Wasi and Chang, Kai-Wei and Wang, Hongning},
        title = {Context Attentive Document Ranking and Query Suggestion},
        booktitle = {SIGIR},
        year = {2019}
      }
      
      Details
    • Multifaceted Protein-Protein Interaction Prediction Based on Siamese Residual RCNN

      Muhao Chen, Chelsea J.-T. Ju, Guangyu Zhou, Xuelu Chen, Tianran Zhang, Kai-Wei Chang, Carlo Zaniolo, and Wei Wang, in ISMB, 2019.
      Full Text Code Abstract BibTeX Details
      Sequence-based protein-protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information. Hence, we present an end-to-end framework, Lasagna, for PPI predictions using only the primary sequences of a protein pair. Lasagna incorporates a deep residual recurrent convolutional neural network in the Siamese learning architecture, which leverages both robust local features and contextualized information that are significant for capturing the mutual influence of protein sequences. Our framework relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that Lasagna outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.
      @inproceedings{chen2019multifaceted,
        author = {Chen, Muhao and Ju, Chelsea J.-T. and Zhou, Guangyu and Chen, Xuelu and Zhang, Tianran and Chang, Kai-Wei and Zaniolo, Carlo and Wang, Wei},
        title = {Multifaceted Protein-Protein Interaction Prediction Based on Siamese Residual RCNN},
        booktitle = {ISMB},
        year = {2019}
      }
      
      Details
    • Multi-Task Learning for Document Ranking and Query Suggestion

      Wasi Ahmad, Kai-Wei Chang, and Hongning Wang, in ICLR, 2018.
      Full Text Code Abstract BibTeX Details
      We propose a multi-task learning framework to jointly learn document ranking and query suggestion for web search. It consists of two major components, a document ranker and a query recommender. Document ranker combines current query and session information and compares the combined representation with document representation to rank the documents. Query recommen tracks users’ query reformulation sequence considering all previous in-session queries using a sequence to sequence approach. As both tasks are driven by the users’ underlying search intent, we perform joint learning of these two components through session recurrence, which encodes search context and intent. Extensive comparisons against state-of-the-art document ranking and query suggestion algorithms are performed on the public AOL search log, and the promising results endorse the effectiveness of the joint learning framework.
      @inproceedings{ahmad2018multitask,
        author = {Ahmad, Wasi and Chang, Kai-Wei and Wang, Hongning},
        title = {Multi-Task Learning for Document Ranking and Query Suggestion},
        booktitle = {ICLR},
        year = {2018}
      }
      
      Details
    • Intent-aware Query Obfuscation for Privacy Protection in Personalized Web Search

      Wasi Ahmad, Kai-Wei Chang, and Hongning Wang, in SIGIR, 2018.
      Full Text Code Abstract BibTeX Details
      Modern web search engines exploit users’ search history to personalize search results, with a goal of improving their service utility on a per-user basis. But it is this very dimension that leads to the risk of privacy infringement and raises serious public concerns. In this work, we propose a client-centered intent-aware query obfuscation solution for protecting user privacy in a personalized web search scenario. In our solution, each user query is submitted with l additional cover queries and corresponding clicks, which act as decoys to mask users’ genuine search intent from a search engine. The cover queries are sequentially sampled from a set of hierarchically organized language models to ensure the coherency of fake search intents in a cover search task. Our approach emphasizes the plausibility of generated cover queries, not only to the current genuine query but also to previous queries in the same task, to increase the complexity for a search engine to identify a user’s true intent. We also develop two new metrics from an information theoretic perspective to evaluate the effectiveness of provided privacy protection. Comprehensive experiment comparisons with state-of-the-art query obfuscation techniques are performed on the public AOL search log, and the propitious results substantiate the effectiveness of our solution.
      @inproceedings{ahmad2018intent,
        author = {Ahmad, Wasi and Chang, Kai-Wei and Wang, Hongning},
        title = {Intent-aware Query Obfuscation for Privacy Protection in Personalized Web Search},
        booktitle = {SIGIR},
        year = {2018}
      }
      
      Details
    • Counterexamples for Robotic Planning Explained in Structured Language

      Lu Feng, Mahsa Ghasemi, Kai-Wei Chang, and Ufuk Topcu, in ICRA, 2018.
      Full Text Abstract BibTeX Details
      Automated techniques such as model checking have been used to verify models of robotic mission plans based on Markov decision processes (MDPs) and generate counterexamples that may help diagnose requirement violations. However, such artifacts may be too complex for humans to understand, because existing representations of counterexamples typically include a large number of paths or a complex automaton. To help improve the interpretability of counterexamples, we define a notion of explainable counterexample, which includes a set of structured natural language sentences to describe the robotic behavior that lead to a requirement violation in an MDP model of robotic mission plan. We propose an approach based on mixed-integer linear programming for generating explainable counterexamples that are minimal, sound and complete. We demonstrate the usefulness of the proposed approach via a case study of warehouse robots planning.
      @inproceedings{feng2018conterexamples,
        author = {Feng, Lu and Ghasemi, Mahsa and Chang, Kai-Wei and Topcu, Ufuk},
        title = {Counterexamples for Robotic Planning Explained in Structured Language},
        booktitle = {ICRA},
        year = {2018}
      }
      
      Details
    • Word and sentence embedding tools to measure semantic similarity of Gene Ontology terms by their definitions

      Dat Duong, Wasi Uddin Ahmad, Eleazar Eskin, Kai-Wei Chang, and Jingyi Jessica Li, in Journal of Computational Biology, 2018.
      Full Text Code Abstract BibTeX Details
      The Gene Ontology (GO) database contains GO terms that describe biological functions of genes.
      Previous methods for comparing GO terms have relied on the fact that GO terms are organized
      into a tree structure. Under this paradigm, the locations of two GO terms in the tree dictate their
      similarity score. In this paper, we introduce two new solutions for this problem, by focusing
      instead on the definitions of the GO terms. We apply neural network based techniques from
      the natural language processing (NLP) domain. The first method does not rely on the GO tree,
      whereas the second indirectly depends on the GO tree. In our first approach, we compare two GO
      definitions by treating them as two unordered sets of words. The word similarity is estimated by a
      word embedding model that maps words into an N-dimensional space. In our second approach,
      we account for the word-ordering within a sentence. We use a sentence encoder to embed GO
      definitions into vectors and estimate how likely one definition entails another. We validate our
      methods in two ways. In the first experiment, we test the model’s ability to differentiate a true
      protein-protein network from a randomly generated network. In the second experiment, we test
      the model in identifying orthologs from randomly-matched genes in human, mouse, and fly. In
      both experiments, a hybrid of NLP and GO-tree based method achieves the best classification
      accuracy.
      @inproceedings{DAECL18,
        author = {Duong, Dat and Ahmad, Wasi Uddin and Eskin, Eleazar and Chang, Kai-Wei and Li, Jingyi Jessica},
        title = {Word and sentence embedding tools to measure semantic similarity of Gene Ontology terms by their definitions},
        booktitle = {Journal of Computational Biology},
        year = {2018}
      }
      
      Details

    Details

Energy Efficient Pre-Training

Contextual representation models greatly improve various NLP tasks. However they are difficult to train due to their large parameter size and high computational complexity. We present a paper to drastically reduce the trainable parameters and training time.

[1]
  • Efficient Contextual Representation Learning With Continuous Outputs

    Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, and Kai-Wei Chang, in TACL, 2019.
    QA Sessions: 4B Machine Learning, 5B Machine Learning Paper link in the virtual conference
    Full Text Slides BibTeX Details
    Contextual representation models have achieved great success in improving various downstream natural language processing tasks. However, these language-model-based encoders are difficult to train due to their large parameter size and high computational complexity. By carefully examining the training procedure, we observe that the softmax layer, which predicts a distribution of the target word, often induces significant overhead, especially when the vocabulary size is large. Therefore, we revisit the design of the output layer and consider directly predicting the pre-trained embedding of the target word for a given context. When applied to ELMo, the proposed approach achieves a 4 times speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks. Further analysis shows that the approach maintains the speed advantage under various settings, even when the sentence encoder is scaled up.
    @inproceedings{li2019efficient,
      author = {Li, Liunian Harold and Chen, Patrick H. and Hsieh, Cho-Jui and Chang, Kai-Wei},
      title = {Efficient Contextual Representation Learning With Continuous Outputs},
      booktitle = {TACL},
      year = {2019}
    }
    

    Related Publications

    • Distributed Block-diagonal Approximation Methods for Regularized Empirical Risk Minimization

      Ching-pei Lee and Kai-Wei Chang, in Machine Learning Journal, 2019.
      Full Text Code Abstract BibTeX Details
      Designing distributed algorithms for empirical risk minimization (ERM) has become an active research topic in recent years because of the practical need to deal with the huge volume of data. In this paper, we propose a general framework for training an ERM model via solving its dual problem in parallel over multiple machines. Our method provides a versatile approach for many large-scale machine learning problems, including linear binary/multi-class classification, regression, and structured prediction. Comparing with existing approaches, we show that our method has faster convergence under weaker conditions both theoretically and empirically.
      @inproceedings{LD17,
        author = {Lee, Ching-pei and Chang, Kai-Wei},
        title = {Distributed Block-diagonal Approximation Methods for Regularized Empirical Risk Minimization},
        booktitle = {Machine Learning Journal},
        year = {2019}
      }
      
      Details
    • Robust Text Classifier on Test-Time Budgets

      Md Rizwan Parvez, Tolga Bolukbasi, Kai-Wei Chang, and Venkatesh Saligrama, in EMNLP (short), 2019.
      Full Text Slides Code Abstract BibTeX Details
      We propose a generic and interpretable learning framework for building robust text classification model that achieves accuracy comparable to full models under test-time budget constraints. Our approach learns a selector to identify words that are relevant to the prediction tasks and passes them to the classifier for processing. The selector is trained jointly with the classifier and directly learns to incorporate with the classifier. We further propose a data aggregation scheme to improve the robustness of the classifier. Our learning framework is general and can be incorporated with any type of text classification model. On real-world data, we show that the proposed approach improves the performance of a given classifier and speeds up the model with a mere loss in accuracy performance.
      @inproceedings{parvez2019robust,
        author = {Parvez, Md Rizwan and Bolukbasi, Tolga and Chang, Kai-Wei and Saligrama, Venkatesh},
        title = {Robust Text Classifier on Test-Time Budgets},
        booktitle = {EMNLP (short)},
        year = {2019}
      }
      
      Details
    • Efficient Contextual Representation Learning With Continuous Outputs

      Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, and Kai-Wei Chang, in TACL, 2019.
      Full Text Slides Video Abstract BibTeX Details
      Contextual representation models have achieved great success in improving various downstream natural language processing tasks. However, these language-model-based encoders are difficult to train due to their large parameter size and high computational complexity. By carefully examining the training procedure, we observe that the softmax layer, which predicts a distribution of the target word, often induces significant overhead, especially when the vocabulary size is large. Therefore, we revisit the design of the output layer and consider directly predicting the pre-trained embedding of the target word for a given context. When applied to ELMo, the proposed approach achieves a 4 times speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks. Further analysis shows that the approach maintains the speed advantage under various settings, even when the sentence encoder is scaled up.
      @inproceedings{li2019efficient,
        author = {Li, Liunian Harold and Chen, Patrick H. and Hsieh, Cho-Jui and Chang, Kai-Wei},
        title = {Efficient Contextual Representation Learning With Continuous Outputs},
        booktitle = {TACL},
        year = {2019}
      }
      
      Details
    • Structured Prediction with Test-time Budget Constraints

      Tolga Bolukbasi, Kai-Wei Chang, Joseph Wang, and Venkatesh Saligrama, in AAAI, 2017.
      Full Text Slides Abstract BibTeX Details
      We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two real-world structured prediction tasks, optical character recognition (OCR) and dependency parsing. For OCR our method cuts the feature acquisition time by half coming within a 1% margin of top accuracy. For dependency parsing we realize an overall runtime gain of 20% without significant loss in performance.
      @inproceedings{bolukbasi2017structured,
        author = {Bolukbasi, Tolga and Chang, Kai-Wei and Wang, Joseph and Saligrama, Venkatesh},
        title = {Structured Prediction with Test-time Budget Constraints},
        booktitle = {AAAI},
        year = {2017}
      }
      
      Details
    • A Credit Assignment Compiler for Joint Prediction

      Kai-Wei Chang, He He, Hal Daume III, John Langford, and Stephane Ross, in NeurIPS, 2016.
      Full Text Code Abstract BibTeX Details
      Many machine learning applications involve jointly predicting multiple mutually dependent output variables. Learning to search is a family of methods where the complex decision problem is cast into a sequence of decisions via a search space. Although these methods have shown promise both in theory and in practice, implementing them has been burdensomely awkward. In this paper, we show the search space can be defined by an arbitrary imperative program, turning learning to search into a credit assignment compiler. Altogether with the algorithmic improvements for the compiler, we radically reduce the complexity of programming and the running time. We demonstrate the feasibility of our approach on multiple joint prediction tasks. In all cases, we obtain accuracies as high as alternative approaches, at drastically reduced execution and programming time.
      @inproceedings{chang2016credit,
        author = {Chang, Kai-Wei and He, He and III, Hal Daume and Langford, John and Ross, Stephane},
        title = {A Credit Assignment Compiler for Joint Prediction},
        booktitle = {NeurIPS},
        year = {2016}
      }
      
      Details
    • Learning to Search Better Than Your Teacher

      Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daume; III, and John Langford, in ICML, 2015.
      Full Text Video Code Abstract BibTeX Details
      Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor?
      We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.
      @inproceedings{chang2015learninh,
        author = {Chang, Kai-Wei and Krishnamurthy, Akshay and Agarwal, Alekh and III, Hal Daume; and Langford, John},
        title = {Learning to Search Better Than Your Teacher},
        booktitle = {ICML},
        year = {2015}
      }
      
      Details
    • Structural Learning with Amortized Inference

      Kai-Wei Chang, Shyam Upadhyay, Gourab Kundu, and Dan Roth, in AAAI, 2015.
      Full Text Poster Abstract BibTeX Details
      Training a structured prediction model involves performing several loss-augmented inference steps. Over the lifetime of the training, many of these inference problems, although different, share the same solution. We propose AI-DCD, an Amortized Inference framework for Dual Coordinate Descent method, an approximate learning algorithm, that accelerates the training process by exploiting this redundancy of solutions, without compromising the performance of the model. We show the efficacy of our method by training a structured SVM using dual coordinate descent for an entity-relation extraction task. Our method learns the same model as an exact training algorithm would, but call the inference engine only in 10% . 24% of the inference problems encountered during training. We observe similar gains on a multi-label classification task and with a Structured Perceptron model for the entity-relation task.
      @inproceedings{chang2015structural,
        author = {Chang, Kai-Wei and Upadhyay, Shyam and Kundu, Gourab and Roth, Dan},
        title = {Structural Learning with Amortized Inference},
        booktitle = {AAAI},
        year = {2015}
      }
      
      Details

    Details

Enhance Contextulaized Encoder

We present the following two papers to enhance contextualized encoders by 1) injecting pronunciation embedding for Pun Recognition, and 2) by incorporating tree structure to capture compositional sentiment semantics for sentiment analysis.

[1], [2]
  • SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics

    Da Yin, Tao Meng, and Kai-Wei Chang, in ACL, 2020.
    QA Sessions: 6B Sentiment Analysis, 8A Sentiment Analysis Paper link in the virtual conference
    Full Text Slides Code BibTeX Details
    We propose SentiBERT, a variant of BERT that effectively captures compositional sentiment semantics. The model incorporates contextualized representation with binary constituency parse tree to capture semantic composition. Comprehensive experiments demonstrate that SentiBERT achieves competitive performance on phrase-level sentiment classification. We further demonstrate that the sentiment composition learned from the phrase-level annotations on SST can be transferred to other sentiment analysis tasks as well as related tasks, such as emotion classification tasks. Moreover, we conduct ablation studies and design visualization methods to understand SentiBERT. We show that SentiBERT is better than baseline approaches in capturing negation and the contrastive relation and model the compositional sentiment semantics.
    @inproceedings{yin2020sentibert,
      author = {Yin, Da and Meng, Tao and Chang, Kai-Wei},
      title = {SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics},
      booktitle = {ACL},
      year = {2020},
      presentation_id = {https://virtual.acl2020.org/paper_main.341.html}
    }
    

    Related Publications

    • An Integer Linear Programming Framework for Mining Constraints from Data

      Tao Meng and Kai-Wei Chang, in ICML, 2021.
      Full Text Code Abstract BibTeX Details
      Various structured output prediction problems (e.g., sequential tagging) involve constraints over the output space. By identifying these constraints, we can filter out infeasible solutions and build an accountable model.
      To this end, we present a general integer linear programming (ILP) framework for mining constraints from data. We model the inference of structured output prediction as an ILP problem. Then, given the coefficients of the objective function and the corresponding solution, we mine the underlying constraints by estimating the outer and inner polytopes of the feasible set. We verify the proposed constraint mining algorithm in various synthetic and real-world applications and demonstrate that the proposed approach successfully identifies the feasible set at scale.
      In particular, we show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules. We also demonstrate results on hierarchical multi-label classification and conduct a theoretical analysis on how close the mined constraints are from the ground truth.
      @inproceedings{meng2020integer,
        author = {Meng, Tao and Chang, Kai-Wei},
        title = {An Integer Linear Programming Framework for Mining Constraints from Data},
        booktitle = {ICML},
        year = {2021}
      }
      
      Details
    • Intent Classification and Slot Filling for Privacy Policies

      Wasi Ahmad, Jianfeng Chi, Tu Le, Thomas Norton, Yuan Tian, and Kai-Wei Chang, in ACL, 2021.
      Full Text Abstract BibTeX Details
      Understanding privacy policies is crucial for users as it empowers them to learn about the information that matters to them. Sentences written in a privacy policy document explain privacy practices, and the constituent text spans convey further specific information about that practice. We refer to predicting the privacy practice explained in a sentence as intent classification and identifying the text spans sharing specific information as slot filling. In this work, we propose PolicyIE, a corpus consisting of 5,250 intent and 11,788 slot annotations spanning 31 privacy policies of websites and mobile applications. PolicyIE corpus is a challenging benchmark with limited labeled examples reflecting the cost of collecting large-scale annotations. We present two alternative neural approaches as baselines: (1) formulating intent classification and slot filling as a joint sequence tagging and (2) modeling them as a sequence-to-sequence (Seq2Seq) learning task. Experiment results show that both approaches perform comparably in intent classification, while the Seq2Seq method outperforms the sequence tagging approach in slot filling by a large margin. Error analysis reveals the deficiency of the baseline approaches, suggesting room for improvement in future works. We hope the PolicyIE corpus will stimulate future research in this domain.
      @inproceedings{ahmad2021intent,
        title = {Intent Classification and Slot Filling for Privacy Policies},
        author = {Ahmad, Wasi and Chi, Jianfeng and Le, Tu and Norton, Thomas and Tian, Yuan and Chang, Kai-Wei},
        booktitle = {ACL},
        year = {2021}
      }
      
      Details
    • Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs

      Kuan-Hao Huang and Kai-Wei Chang, in EACL, 2021.
      Full Text Code Abstract BibTeX Details
      Paraphrase generation plays an essential role in natural language process (NLP), and it has many downstream applications. However, training supervised paraphrase models requires many annotated paraphrase pairs, which are usually costly to obtain. On the other hand, the paraphrases generated by existing unsupervised approaches are usually syntactically similar to the source sentences and are limited in diversity. In this paper, we demonstrate that it is possible to generate syntactically various paraphrases without the need for annotated paraphrase pairs. We propose Syntactically controlled Paraphrase Generator (SynPG), an encoder-decoder based model that learns to disentangle the semantics and the syntax of a sentence from a collection of unannotated texts. The disentanglement enables SynPG to control the syntax of output paraphrases by manipulating the embedding in the syntactic space. Extensive experiments using automatic metrics and human evaluation show that SynPG performs better syntactic control than unsupervised baselines, while the quality of the generated paraphrases is competitive. We also demonstrate that the performance of SynPG is competitive or even better than supervised models when the unannotated data is large. Finally, we show that the syntactically controlled paraphrases generated by SynPG can be utilized for data augmentation to improve the robustness of NLP models.
      @inproceedings{huang2021generating,
        author = {Huang, Kuan-Hao and Chang, Kai-Wei},
        title = {Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs},
        booktitle = {EACL},
        year = {2021}
      }
      
      Details
    • Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference

      Yichao Zhou, Yu Yan, Rujun Han, J. Harry Caufield, Kai-Wei Chang, Yizhou Sun, Peipei Ping, and Wei Wang, in AAAI, 2021.
      Full Text Code Abstract BibTeX Details
      There  has  been  a  steady  need  in  the  medical  community to  precisely  extract  the  temporal  relations  between  clinical events. In particular, temporal information can facilitate a variety of downstream applications such as case report retrieval and medical question answering. However, existing methods either require expensive feature engineering or are incapable of  modeling  the  global  relational  dependencies  among  theevents. In this paper, we propose Clinical Temporal Relation Exaction  with  Probabilistic  Soft  Logic  Regularization  and Global Inference (CTRL-PG), a novel method to tackle the problem at the document level. Extensive experiments on two benchmark datasets, I2B2-2012 and TB-Dense, demonstrate that CTRL-PG significantly  outperforms  baseline  methodsfor temporal relation extraction.
      @inproceedings{zhou2021clinical,
        author = {Zhou, Yichao and Yan, Yu and Han, Rujun and Caufield, J. Harry and Chang, Kai-Wei and Sun, Yizhou and Ping, Peipei and Wang, Wei},
        title = {Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference},
        booktitle = {AAAI},
        year = {2021}
      }
      
      Details
    • PolicyQA: A Reading Comprehension Dataset for Privacy Policies

      Wasi Ahmad, Jianfeng Chi, Yuan Tian, and Kai-Wei Chang, in EMNLP-Finding (short), 2020.
      Full Text Code Abstract BibTeX Details
      Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
      @inproceedings{ahmad2020policyqa,
        author = {Ahmad, Wasi and Chi, Jianfeng and Tian, Yuan and Chang, Kai-Wei},
        title = {PolicyQA: A Reading Comprehension Dataset for Privacy Policies},
        booktitle = {EMNLP-Finding (short)},
        year = {2020}
      }
      
      Details
    • GPT-GNN: Generative Pre-Training of Graph Neural Networks

      Ziniu Hu, Yuxiao Dong, Kuansan Wang, Kai-Wei Chang, and Yizhou Sun, in KDD, 2020.
      Full Text Video Code Abstract BibTeX Details
      Graph neural networks (GNNs) have been demonstrated to besuccessful in modeling graph-structured data. However, training GNNs requires abundant task-specific labeled data, which is often arduously expensive to obtain. One effective way to reduce labeling effort is to pre-train an expressive GNN model on unlabelled data with self-supervision and then transfer the learned knowledge to downstream models. In this paper, we present the GPT-GNN’s framework to initialize GNNs by generative pre-training. GPT-GNN introduces a self-supervised attributed graph generation task to pre-train a GNN,which allows the GNN to capture the intrinsic structural and semantic properties of the graph. We factorize the likelihood of graph generation into two components: 1) attribute generation, and 2) edgegeneration. By modeling both components, GPT-GNN captures the inherent dependency between node attributes and graph structure during the generative process. Comprehensive experiments on thebillion-scale academic graph and Amazon recommendation data demonstrate that GPT-GNN significantly outperforms state-of-the-art base GNN models without pre-training by up to 9.1% across different downstream tasks.
      @inproceedings{hu2020gptgnn,
        author = {Hu, Ziniu and Dong, Yuxiao and Wang, Kuansan and Chang, Kai-Wei and Sun, Yizhou},
        title = {GPT-GNN: Generative Pre-Training of Graph Neural Networks},
        booktitle = {KDD},
        slide_url = {https://acbull.github.io/pdf/gpt.pptx},
        year = {2020}
      }
      
      Details
    • SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics

      Da Yin, Tao Meng, and Kai-Wei Chang, in ACL, 2020.
      Full Text Slides Video Code Abstract BibTeX Details
      We propose SentiBERT, a variant of BERT that effectively captures compositional sentiment semantics. The model incorporates contextualized representation with binary constituency parse tree to capture semantic composition. Comprehensive experiments demonstrate that SentiBERT achieves competitive performance on phrase-level sentiment classification. We further demonstrate that the sentiment composition learned from the phrase-level annotations on SST can be transferred to other sentiment analysis tasks as well as related tasks, such as emotion classification tasks. Moreover, we conduct ablation studies and design visualization methods to understand SentiBERT. We show that SentiBERT is better than baseline approaches in capturing negation and the contrastive relation and model the compositional sentiment semantics.
      @inproceedings{yin2020sentibert,
        author = {Yin, Da and Meng, Tao and Chang, Kai-Wei},
        title = {SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics},
        booktitle = {ACL},
        year = {2020},
        presentation_id = {https://virtual.acl2020.org/paper_main.341.html}
      }
      
      Details
    • Building Language Models for Text with Named Entities

      Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in ACL, 2018.
      Full Text Poster Code Abstract BibTeX Details
      Text in many domains involves a significant amount of named entities. Predicting the entity names is often challenging for a language model as they appear less frequent on the training corpus. In this paper, we propose a novel and effective approach to building a language model which can learn the entity names by leveraging their entity type information. We also introduce two benchmark datasets based on recipes and Java programming codes, on which we evaluate the proposed model. Experimental results show that our model achieves 52.2% better perplexity in recipe generation and 40.3% on code generation than state-of-the-art language models.
      @inproceedings{parvez2018building,
        author = {Parvez, Md Rizwan and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
        title = {Building Language Models for Text with Named Entities},
        booktitle = {ACL},
        year = {2018}
      }
      
      Details
    • Learning from Explicit and Implicit Supervision Jointly For Algebra Word Problems

      Shyam Upadhyay, Ming-Wei Chang, Kai-Wei Chang, and Wen-tau Yih, in EMNLP, 2016.
      Full Text Abstract BibTeX Details
      Automatically solving algebra word problems has raised considerable interest recently. Existing state-of-the-art approaches mainly rely on learning from human annotated equations. In this paper, we demonstrate that it is possible to efficiently mine algebra problems and their numerical solutions with little to no manual effort. To leverage the mined dataset, we propose a novel structured-output learning algorithm that aims to learn from both explicit (e.g., equations) and implicit (e.g., solutions) supervision signals jointly. Enabled by this new algorithm, our model gains 4.6% absolute improvement in accuracy on the ALG-514 benchmark compared to the one without using implicit supervision. The final model also outperforms the current state-of-the-art approach by 3%.
      Dataset
      @inproceedings{BCWS16,
        author = {Upadhyay, Shyam and Chang, Ming-Wei and Chang, Kai-Wei and Yih, Wen-tau},
        title = {Learning from Explicit and Implicit Supervision Jointly For Algebra Word Problems},
        booktitle = {EMNLP},
        year = {2016}
      }
      
      Details

    Details
  • "The Boating Store Had Its Best Sail Ever": Pronunciation-attentive Contextualized Pun Recognition

    Yichao Zhou, Jyun-Yu Jiang, Jieyu Zhao, Kai-Wei Chang, and Wei Wang, in ACL, 2020.
    QA Sessions: 1B Application, 5B Application Paper link in the virtual conference
    Full Text Slides Code BibTeX Details
    Humor plays an important role in human languages and it is essential to model humor when building intelligence systems. Among different forms of humor, puns perform wordplay for humorous effects by employing words with double entendre and high phonetic similarity. However, identifying and modeling puns are challenging as puns usually involved implicit semantic or phonological tricks. In this paper, we propose Pronunciation-attentive Contextualized Pun Recognition (PCPR) to perceive human humor, detect if a sentence contains puns and locate them in the sentence. PCPR derives contextualized representation for each word in a sentence by capturing the association between the surrounding context and its corresponding phonetic symbols. Extensive experiments are conducted on two benchmark datasets. Results demonstrate that the proposed approach significantly outperforms the state-of-the-art methods in pun detection and location tasks. In-depth analyses verify the effectiveness and robustness of PCPR.
    @inproceedings{zhou2020boating,
      author = {Zhou, Yichao and Jiang, Jyun-Yu and Zhao, Jieyu and Chang, Kai-Wei and Wang, Wei},
      title = {"The Boating Store Had Its Best Sail Ever": Pronunciation-attentive Contextualized Pun Recognition},
      booktitle = {ACL},
      presentation_id = {https://virtual.acl2020.org/paper_main.75.html},
      year = {2020}
    }
    

    Related Publications

    • "The Boating Store Had Its Best Sail Ever": Pronunciation-attentive Contextualized Pun Recognition

      Yichao Zhou, Jyun-Yu Jiang, Jieyu Zhao, Kai-Wei Chang, and Wei Wang, in ACL, 2020.
      Full Text Slides Video Code Abstract BibTeX Details
      Humor plays an important role in human languages and it is essential to model humor when building intelligence systems. Among different forms of humor, puns perform wordplay for humorous effects by employing words with double entendre and high phonetic similarity. However, identifying and modeling puns are challenging as puns usually involved implicit semantic or phonological tricks. In this paper, we propose Pronunciation-attentive Contextualized Pun Recognition (PCPR) to perceive human humor, detect if a sentence contains puns and locate them in the sentence. PCPR derives contextualized representation for each word in a sentence by capturing the association between the surrounding context and its corresponding phonetic symbols. Extensive experiments are conducted on two benchmark datasets. Results demonstrate that the proposed approach significantly outperforms the state-of-the-art methods in pun detection and location tasks. In-depth analyses verify the effectiveness and robustness of PCPR.
      @inproceedings{zhou2020boating,
        author = {Zhou, Yichao and Jiang, Jyun-Yu and Zhao, Jieyu and Chang, Kai-Wei and Wang, Wei},
        title = {"The Boating Store Had Its Best Sail Ever": Pronunciation-attentive Contextualized Pun Recognition},
        booktitle = {ACL},
        presentation_id = {https://virtual.acl2020.org/paper_main.75.html},
        year = {2020}
      }
      
      Details

    Details