At UCLA-NLP, our mission is to develop reliable, fair, accountable, robust natural language understanding and generation technology to benefit everyone.

Please see our recent papers at

In the following, we will highlight our research papers at EMNLP 2021 on the following topics:

• AESOP: Paraphrase Generation with Adaptive Syntactic Control

Jiao Sun, Xuezhe Ma, and Nanyun Peng, in EMNLP, 2021.
QA Sessions: VIRTUAL POSTER SESSION II: GENERATION Paper link in the virtual conference
BibTeX Details
@inproceedings{sun2021aesop,
title = {AESOP: Paraphrase Generation with Adaptive Syntactic Control},
author = {Sun, Jiao and Ma, Xuezhe and Peng, Nanyun},
booktitle = {EMNLP},
year = {2021}
}


• AESOP: Paraphrase Generation with Adaptive Syntactic Control

Jiao Sun, Xuezhe Ma, and Nanyun Peng, in EMNLP, 2021.
BibTeX Details
@inproceedings{sun2021aesop,
title = {AESOP: Paraphrase Generation with Adaptive Syntactic Control},
author = {Sun, Jiao and Ma, Xuezhe and Peng, Nanyun},
booktitle = {EMNLP},
year = {2021}
}

Details

Details
• HypoGen: Hyperbole Generation with Commonsense and Counterfactual Knowledge

Yufei Tian, Arvind krishna Sridhar, and Nanyun Peng, in EMNLP-Finding, 2021.
QA Sessions: FINDINGS PAPERS - GENERATION Paper link in the virtual conference
Full Text BibTeX Details
 A hyperbole is an intentional and creative exaggeration not to be taken literally. Despite its ubiquity in daily life, the computational explorations of hyperboles are scarce. In this paper, we tackle the under-explored and challenging task: sentence-level hyperbole generation. We start with a representative syntactic pattern for intensification and systematically study the semantic (commonsense and counterfactual) relationships between each component in such hyperboles. We then leverage commonsense and counterfactual inference to generate hyperbole candidates based on our findings from the pattern, and train neural classifiers to rank and select high-quality hyperboles. Automatic and human evaluations show that our generation method is able to generate hyperboles creatively with high success rate and intensity.
@inproceedings{tian2021hypogen,
title = {HypoGen: Hyperbole Generation with Commonsense and Counterfactual Knowledge},
author = {Tian, Yufei and Sridhar, Arvind krishna and Peng, Nanyun},
booktitle = {EMNLP-Finding},
year = {2021},
presentation_id = {https://underline.io/events/192/sessions/7923/lecture/38307-hypogen-hyperbole-generation-with-commonsense-and-counterfactual-knowledge}
}


Details

• On the Transferability of Adversarial Attacks against Neural Text Classifier

Liping Yuan, Xiaoqing Zheng, Yi Zhou, Cho-Jui Hsieh, and Kai-Wei Chang, in EMNLP, 2021.
QA Sessions: VIRTUAL POSTER SESSION I: INTERPRETABILITY AND ANALYSIS OF MODELS FOR NLP Paper link in the virtual conference
Full Text BibTeX Details
Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we present the first study to systematically investigate the transferability of adversarial examples for text classification models and explore how various factors, including network architecture, tokenization scheme, word embedding, and model capacity, affect the transferability of adversarial examples. Based on these studies, we propose a genetic algorithm to find an ensemble of models that can be used to induce adversarial examples to fool almost all existing models. Such adversarial examples reflect the defects of the learning process and the data bias in the training set. Finally, we derive word replacement rules that can be used for model diagnostics from these adversarial examples.
@inproceedings{yuan2021on,
title = {On the Transferability of Adversarial Attacks against Neural Text Classifier},
author = {Yuan, Liping and Zheng, Xiaoqing and Zhou, Yi and Hsieh, Cho-Jui and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2021}
}


• On the Transferability of Adversarial Attacks against Neural Text Classifier

Liping Yuan, Xiaoqing Zheng, Yi Zhou, Cho-Jui Hsieh, and Kai-Wei Chang, in EMNLP, 2021.
Full Text Abstract BibTeX Details
Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we present the first study to systematically investigate the transferability of adversarial examples for text classification models and explore how various factors, including network architecture, tokenization scheme, word embedding, and model capacity, affect the transferability of adversarial examples. Based on these studies, we propose a genetic algorithm to find an ensemble of models that can be used to induce adversarial examples to fool almost all existing models. Such adversarial examples reflect the defects of the learning process and the data bias in the training set. Finally, we derive word replacement rules that can be used for model diagnostics from these adversarial examples.
@inproceedings{yuan2021on,
title = {On the Transferability of Adversarial Attacks against Neural Text Classifier},
author = {Yuan, Liping and Zheng, Xiaoqing and Zhou, Yi and Hsieh, Cho-Jui and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2021}
}

Details
• Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution

Zongyi Li, Jianhan Xu, Jiehang Zeng, Linyang Li, Xiaoqing Zheng, Qi Zhang, Kai-Wei Chang, and Cho-Jui Hsieh, in EMNLP, 2021.
Full Text Abstract BibTeX Details
Recent studies have shown that deep neural networks are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However, there is a lack of systematic study on comparing different defense approaches under the same attacking setting. In this paper, we seek to fill the gap of systematic studies through comprehensive researches on understanding the behavior of neural text classifiers trained by various defense methods under representative adversarial attacks. In addition, we propose an effective method to further improve the robustness of neural text classifiers against such attacks and achieved the highest accuracy on both clean and adversarial examples on AGNEWS and IMDB datasets by a significant margin.
@inproceedings{li2021searching,
title = {Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution},
author = {Li, Zongyi and Xu, Jianhan and Zeng, Jiehang and Li, Linyang and Zheng, Xiaoqing and Zhang, Qi and Chang, Kai-Wei and Hsieh, Cho-Jui},
booktitle = {EMNLP},
year = {2021}
}

Details
• Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble

Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang, in ACL, 2021.
Full Text Code Abstract BibTeX Details
Although deep neural networks have achieved prominent performance on many NLP tasks, they are vulnerable to adversarial examples. We propose Dirichlet Neighborhood Ensemble (DNE), a randomized method for training a robust model to defense synonym substitutionbased attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models (e.g., BERT) for NLP applications. Through extensive experimentation, we demonstrate that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
@inproceedings{zhou2021defense,
title = {Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble},
author = {Zhou, Yi and Zheng, Xiaoqing and Hsieh, Cho-Jui and Chang, Kai-Wei and Huang, Xuanjing},
booktitle = {ACL},
year = {2021}
}

Details
• Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation

Chong Zhang, Jieyu Zhao, Huan Zhang, Kai-Wei Chang, and Cho-Jui Hsieh, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Robustness and counterfactual bias are usually evaluated on a test dataset. However, are these evaluations robust? If the test dataset is perturbed slightly, will the evaluation results keep the same? In this paper, we propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset. The framework first perturbs the test dataset to construct abundant natural sentences similar to the test data, and then diagnoses the prediction change regarding a single-word substitution. We apply this framework to study two perturbation-based approaches that are used to analyze models’ robustness and counterfactual bias in English. (1) For robustness, we focus on synonym substitutions and identify vulnerable examples where prediction can be altered. Our proposed attack attains high success rates (96.0%-99.8%) in finding vulnerable examples on both original and robustly trained CNNs and Transformers. (2) For counterfactual bias, we focus on substituting demographic tokens (e.g., gender, race) and measure the shift of the expected prediction among constructed sentences. Our method is able to reveal the hidden model biases not directly shown in the test dataset.
@inproceedings{zhang2021double,
title = {	Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation},
booktitle = {NAACL},
author = {Zhang, Chong and Zhao, Jieyu and Zhang, Huan and Chang, Kai-Wei and Hsieh, Cho-Jui},
year = {2021},
presentation_id = {https://underline.io/events/122/sessions/4229/lecture/19609-double-perturbation-on-the-robustness-of-robustness-and-counterfactual-bias-evaluation}
}

Details
• Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh, in NeurIPS, 2020.
Full Text Code Abstract BibTeX Details
Linear relaxation based perturbation analysis (LiRPA) for neural networks, which computes provable linear bounds of output neurons given a certain amount of input perturbation, has become a core component in robustness verification and certified defense. The majority of LiRPA-based methods only consider simple feed-forward networks and it needs particular manual derivations and implementations when extended to other architectures. In this paper, we develop an automatic framework to enable perturbation analysis on any neural network structures, by generalizing exiting LiRPA algorithms such as CROWN to operate on general computational graphs. The flexibility, differentiability and ease of use of our framework allow us to obtain state-of-the-art results on LiRPA based certified defense on fairly complicated networks like DenseNet, ResNeXt and Transformer that are not supported by prior work. Our framework also enables loss fusion, a technique that significantly reduces the computational complexity of LiRPA for certified defense. For the first time, we demonstrate LiRPA based certified defense on Tiny ImageNet and Downscaled ImageNet where previous approaches cannot scale to due to the relatively large number of classes. Our work also yields an open-source library for the community to apply LiRPA to areas beyond certified defense without much LiRPA expertise, e.g., we create a neural network with a provably flat optimization landscape. Our open source library is available at https://github.com/KaidiXu/auto_LiRPA
@inproceedings{xu2020provable,
author = {Xu, Kaidi and Shi, Zhouxing and Zhang, Huan and Wang, Yihan and Chang, Kai-Wei and Huang, Minlie and Kailkhura, Bhavya and Lin, Xue and Hsieh, Cho-Jui},
title = {Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs},
booktitle = {NeurIPS},
year = {2020}
}

Details
• On the Robustness of Language Encoders against Grammatical Errors

Fan Yin, Quanyu Long, Tao Meng, and Kai-Wei Chang, in ACL, 2020.
Full Text Slides Video Code Abstract BibTeX Details
We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors.
@inproceedings{yin2020robustness,
author = {Yin, Fan and Long, Quanyu and Meng, Tao and Chang, Kai-Wei},
title = {On the Robustness of Language Encoders against Grammatical Errors},
booktitle = {ACL},
presentation_id = {https://virtual.acl2020.org/paper_main.310.html},
year = {2020}
}

Details
• Robustness Verification for Transformers

Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, and Cho-Jui Hsieh, in ICLR, 2020.
Full Text Video Code Abstract BibTeX Details
Robustness verification that aims to formally certify the prediction behavior of
neural networks has become an important tool for understanding the behavior of
a given model and for obtaining safety guarantees. However, previous methods
are usually limited to relatively simple neural networks. In this paper, we consider the robustness verification problem for Transformers. Transformers have
complex self-attention layers that pose many challenges for verification, including
cross-nonlinearity and cross-position dependency, which have not been discussed
in previous work. We resolve these challenges and develop the first verification
algorithm for Transformers. The certified robustness bounds computed by our
method are significantly tighter than those by naive Interval Bound Propagation.
These bounds also shed light on interpreting Transformers as they consistently
reflect the importance of words in sentiment analysis.
@inproceedings{shi2020robustness,
author = {Shi, Zhouxing and Zhang, Huan and Chang, Kai-Wei and Huang, Minlie and Hsieh, Cho-Jui},
title = {Robustness Verification for Transformers},
booktitle = {ICLR},
year = {2020}
}

Details
• Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification

Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, and Wei Wang, in EMNLP, 2019.
Full Text Code Abstract BibTeX Details
Adversarial attacks against machine learning models have threatened various real-world applications such as spam filtering and sentiment analysis. In this paper, we propose a novel framework, learning to DIScriminate Perturbations (DISP), to identify and adjust malicious perturbations, thereby blocking adversarial attacks for text classification models. To identify adversarial attacks, a perturbation discriminator validates how likely a token in the text is perturbed and provides a set of potential perturbations. For each potential perturbation, an embedding estimator learns to restore the embedding of the original word based on the context and a replacement token is chosen based on approximate kNN search. DISP can block adversarial attacks for any NLP model without modifying the model structure or training procedure. Extensive experiments on two benchmark datasets demonstrate that DISP significantly outperforms baseline methods in blocking adversarial attacks for text classification. In addition, in-depth analysis shows the robustness of DISP across different situations.
@inproceedings{zhou2019learning,
author = {Zhou, Yichao and Jiang, Jyun-Yu and Chang, Kai-Wei and Wang, Wei},
title = {Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification},
booktitle = {EMNLP},
year = {2019}
}

Details
• Retrofitting Contextualized Word Embeddings with Paraphrases

Weijia Shi, Muhao Chen, Pei Zhou, and Kai-Wei Chang, in EMNLP (short), 2019.
Full Text Slides Video Code Abstract BibTeX Details
Contextualized word embedding models, such as ELMo, generate meaningful representations of words and their context. These models have been shown to have a great impact on downstream applications. However, in many cases, the contextualized embedding of a word changes drastically when the context is paraphrased. As a result, the downstream model is not robust to paraphrasing and other linguistic variations. To enhance the stability of contextualized word embedding models, we propose an approach to retrofitting contextualized embedding models with paraphrase contexts. Our method learns an orthogonal transformation on the input space, which seeks to minimize the variance of word representations on paraphrased contexts. Experiments show that the retrofitted model significantly outperforms the original ELMo on various sentence classification and language inference tasks.
@inproceedings{shi2019retrofitting,
author = {Shi, Weijia and Chen, Muhao and Zhou, Pei and Chang, Kai-Wei},
title = {Retrofitting Contextualized Word Embeddings with Paraphrases},
booktitle = {EMNLP (short)},
vimeo_id = {430797636},
year = {2019}
}

Details
• Generating Natural Language Adversarial Examples

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang, in EMNLP (short), 2018.
Full Text Code Abstract BibTeX Details
Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the network to misclassify. In the image domain, these perturbations can often be made virtually indistinguishable to human perception, causing humans and state-of-the-art models to disagree. However, in the natural language domain, small perturbations are clearly perceptible, and the replacement of a single word can drastically alter the semantics of the document. Given these challenges, we use a population-based optimization algorithm to generate semantically and syntactically similar adversarial examples. We demonstrate via a human study that 94.3% of the generated examples are classified to the original label by human evaluators, and that the examples are perceptibly quite similar. We hope our findings encourage researchers to pursue improving the robustness of DNNs in the natural language domain.
@inproceedings{alzanto2018generating,
author = {Alzantot, Moustafa and Sharma, Yash and Elgohary, Ahmed and Ho, Bo-Jhang and Srivastava, Mani and Chang, Kai-Wei},
title = {Generating Natural Language Adversarial Examples},
booktitle = {EMNLP (short)},
year = {2018}
}

Details

Details
• Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

Sunipa Dev, Masoud Monajatipoor, Anaelia Ovalle, Arjun Subramonian, Jeff Phillips, and Kai-Wei Chang, in EMNLP, 2021.
QA Sessions: 4D: ETHICS AND NLP Paper link in the virtual conference
Full Text Slides Poster BibTeX Details
Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understanding of non-binary genders in society. In this paper, we explain the complexity of gender and language around it, and survey non-binary persons to understand harms associated with the treatment of gender as binary in English language technologies. We also detail how current language representations (e.g., GloVe, BERT) capture and perpetuate these harms and related challenges that need to be acknowledged and addressed for representations to equitably encode gender information.
@inproceedings{dev2021harms,
title = {Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies},
author = {Dev, Sunipa and Monajatipoor, Masoud and Ovalle, Anaelia and Subramonian, Arjun and Phillips, Jeff and Chang, Kai-Wei},
presentation_id = {https://underline.io/events/192/sessions/7788/lecture/37320-harms-of-gender-exclusivity-and-challenges-in-non-binary-representation-in-language-technologies},
blog_url = {https://uclanlp.medium.com/harms-of-gender-exclusivity-and-challenges-in-non-binary-representation-in-language-technologies-5f89891b5aee},
booktitle = {EMNLP},
year = {2021}
}


• Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

Sunipa Dev, Masoud Monajatipoor, Anaelia Ovalle, Arjun Subramonian, Jeff Phillips, and Kai-Wei Chang, in EMNLP, 2021.
Full Text Slides Poster Abstract BibTeX Details
Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understanding of non-binary genders in society. In this paper, we explain the complexity of gender and language around it, and survey non-binary persons to understand harms associated with the treatment of gender as binary in English language technologies. We also detail how current language representations (e.g., GloVe, BERT) capture and perpetuate these harms and related challenges that need to be acknowledged and addressed for representations to equitably encode gender information.
@inproceedings{dev2021harms,
title = {Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies},
author = {Dev, Sunipa and Monajatipoor, Masoud and Ovalle, Anaelia and Subramonian, Arjun and Phillips, Jeff and Chang, Kai-Wei},
presentation_id = {https://underline.io/events/192/sessions/7788/lecture/37320-harms-of-gender-exclusivity-and-challenges-in-non-binary-representation-in-language-technologies},
blog_url = {https://uclanlp.medium.com/harms-of-gender-exclusivity-and-challenges-in-non-binary-representation-in-language-technologies-5f89891b5aee},
booktitle = {EMNLP},
year = {2021}
}

Details
• Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini, Kai-Wei Chang, and Ahmed Hassan Awadallah, in ACL, 2020.
Full Text Slides Video Abstract BibTeX Details
Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks.
@inproceedings{zhao2020gender,
author = {Zhao, Jieyu and Mukherjee, Subhabrata and Hosseini, Saghar and Chang, Kai-Wei and Awadallah, Ahmed Hassan},
title = {Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer},
booktitle = {ACL},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.260.html}
}

Details
• Examining Gender Bias in Languages with Grammatical Gender

Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-Hao Huang, Muhao Chen, Ryan Cotterell, and Kai-Wei Chang, in EMNLP, 2019.
Full Text Poster Code Abstract BibTeX Details
Recent studies have shown that word embeddings exhibit gender bias inherited from the training corpora. However, most studies to date have focused on quantifying and mitigating such bias only in English. These analyses cannot be directly extended to languages that exhibit morphological agreement on gender, such as Spanish and French. In this paper, we propose new metrics for evaluating gender bias in word embeddings of these languages and further demonstrate evidence of gender bias in bilingual embeddings which align these languages with English. Finally, we extend an existing approach to mitigate gender bias in word embeddings under both monolingual and bilingual settings. Experiments on modified Word Embedding Association Test, word similarity, word translation, and word pair translation tasks show that the proposed approaches effectively reduce the gender bias while preserving the utility of the embeddings.
@inproceedings{zhou2019examining,
author = {Zhou, Pei and Shi, Weijia and Zhao, Jieyu and Huang, Kuan-Hao and Chen, Muhao and Cotterell, Ryan and Chang, Kai-Wei},
title = {Examining Gender Bias in Languages with Grammatical Gender},
booktitle = {EMNLP},
year = {2019}
}

Details
• Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations

Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, and Vicente Ordonez, in ICCV, 2019.
Full Text Code Demo Abstract BibTeX Details
In this work, we present a framework to measure and mitigate intrinsic biases with respect to protected variables –such as gender– in visual recognition tasks. We show that trained models significantly amplify the association of target labels with gender beyond what one would expect from biased datasets. Surprisingly, we show that even when datasets are balanced such that each label co-occurs equally with each gender, learned models amplify the association between labels and gender, as much as if data had not been balanced! To mitigate this, we adopt an adversarial approach to remove unwanted features corresponding to protected variables from intermediate representations in a deep neural network – and provide a detailed analysis of its effectiveness. Experiments on two datasets: the COCO dataset (objects), and the imSitu dataset (actions), show reductions in gender bias amplification while maintaining most of the accuracy of the original models.
@inproceedings{wang2019balanced,
author = {Wang, Tianlu and Zhao, Jieyu and Yatskar, Mark and Chang, Kai-Wei and Ordonez, Vicente},
title = {Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations},
booktitle = {ICCV},
year = {2019}
}

Details
• Gender Bias in Contextualized Word Embeddings

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang, in NAACL (short), 2019.
Full Text Slides Video Abstract BibTeX Details
Despite the great success of contextualized word embeddings on downstream applications, these representations potentially embed the societal biases exhibited in their training corpus. In this paper, we quantify, analyze and mitigate the gender bias exhibited in ELMo contextualized word vectors. We first demonstrate that the vectors encode and propagate information about genders unequally and then conduct a principal component analysis to visualize the geometry of the gender information in the embeddings. Then we show that ELMo works unequally well for men and women in down-stream tasks. Finally, we explore a variety of methods to remove such gender bias and demonstrate that it can be reduced through data augmentation.
@inproceedings{zhao2019gender,
author = {Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Cotterell, Ryan and Ordonez, Vicente and Chang, Kai-Wei},
title = {Gender Bias in Contextualized Word Embeddings},
booktitle = {NAACL (short)},
year = {2019}
}

Details
• Learning Gender-Neutral Word Embeddings

Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang, in EMNLP (short), 2018.
Full Text Code Abstract BibTeX Details
Word embeddings have become a fundamental component in a wide range of Natu-ral Language Processing (NLP) applications.However, these word embeddings trained onhuman-generated corpora inherit strong gen-der stereotypes that reflect social constructs.In this paper, we propose a novel word em-bedding model, De-GloVe, that preserves gen-der information in certain dimensions of wordvectors while compelling other dimensions tobe free of gender influence. Quantitative andqualitative experiments demonstrate that De-GloVe successfully isolates gender informa-tion without sacrificing the functionality of theembedding model.
@inproceedings{zhao2018learning,
author = {Zhao, Jieyu and Zhou, Yichao and Li, Zeyu and Wang, Wei and Chang, Kai-Wei},
title = {Learning Gender-Neutral Word Embeddings},
booktitle = {EMNLP (short)},
year = {2018}
}

Details
• Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai, in NeurIPS, 2016.
Full Text Code Abstract BibTeX Details reported by NPR and MIT Tech Review
The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.
@inproceedings{bolukbasi2016man,
author = {Bolukbasi, Tolga and Chang, Kai-Wei and Zou, James and Saligrama, Venkatesh and Kalai, Adam},
title = {Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings},
booktitle = {NeurIPS},
year = {2016}
}

Details

Details
• Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution

Zongyi Li, Jianhan Xu, Jiehang Zeng, Linyang Li, Xiaoqing Zheng, Qi Zhang, Kai-Wei Chang, and Cho-Jui Hsieh, in EMNLP, 2021.
QA Sessions: VIRTUAL POSTER SESSION I: MACHINE LEARNING FOR NLP Paper link in the virtual conference
Full Text BibTeX Details
Recent studies have shown that deep neural networks are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However, there is a lack of systematic study on comparing different defense approaches under the same attacking setting. In this paper, we seek to fill the gap of systematic studies through comprehensive researches on understanding the behavior of neural text classifiers trained by various defense methods under representative adversarial attacks. In addition, we propose an effective method to further improve the robustness of neural text classifiers against such attacks and achieved the highest accuracy on both clean and adversarial examples on AGNEWS and IMDB datasets by a significant margin.
@inproceedings{li2021searching,
title = {Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution},
author = {Li, Zongyi and Xu, Jianhan and Zeng, Jiehang and Li, Linyang and Zheng, Xiaoqing and Zhang, Qi and Chang, Kai-Wei and Hsieh, Cho-Jui},
booktitle = {EMNLP},
year = {2021}
}


• On the Transferability of Adversarial Attacks against Neural Text Classifier

Liping Yuan, Xiaoqing Zheng, Yi Zhou, Cho-Jui Hsieh, and Kai-Wei Chang, in EMNLP, 2021.
Full Text Abstract BibTeX Details
Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we present the first study to systematically investigate the transferability of adversarial examples for text classification models and explore how various factors, including network architecture, tokenization scheme, word embedding, and model capacity, affect the transferability of adversarial examples. Based on these studies, we propose a genetic algorithm to find an ensemble of models that can be used to induce adversarial examples to fool almost all existing models. Such adversarial examples reflect the defects of the learning process and the data bias in the training set. Finally, we derive word replacement rules that can be used for model diagnostics from these adversarial examples.
@inproceedings{yuan2021on,
title = {On the Transferability of Adversarial Attacks against Neural Text Classifier},
author = {Yuan, Liping and Zheng, Xiaoqing and Zhou, Yi and Hsieh, Cho-Jui and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2021}
}

Details
• Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution

Zongyi Li, Jianhan Xu, Jiehang Zeng, Linyang Li, Xiaoqing Zheng, Qi Zhang, Kai-Wei Chang, and Cho-Jui Hsieh, in EMNLP, 2021.
Full Text Abstract BibTeX Details
Recent studies have shown that deep neural networks are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However, there is a lack of systematic study on comparing different defense approaches under the same attacking setting. In this paper, we seek to fill the gap of systematic studies through comprehensive researches on understanding the behavior of neural text classifiers trained by various defense methods under representative adversarial attacks. In addition, we propose an effective method to further improve the robustness of neural text classifiers against such attacks and achieved the highest accuracy on both clean and adversarial examples on AGNEWS and IMDB datasets by a significant margin.
@inproceedings{li2021searching,
title = {Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution},
author = {Li, Zongyi and Xu, Jianhan and Zeng, Jiehang and Li, Linyang and Zheng, Xiaoqing and Zhang, Qi and Chang, Kai-Wei and Hsieh, Cho-Jui},
booktitle = {EMNLP},
year = {2021}
}

Details
• Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble

Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang, in ACL, 2021.
Full Text Code Abstract BibTeX Details
Although deep neural networks have achieved prominent performance on many NLP tasks, they are vulnerable to adversarial examples. We propose Dirichlet Neighborhood Ensemble (DNE), a randomized method for training a robust model to defense synonym substitutionbased attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models (e.g., BERT) for NLP applications. Through extensive experimentation, we demonstrate that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
@inproceedings{zhou2021defense,
title = {Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble},
author = {Zhou, Yi and Zheng, Xiaoqing and Hsieh, Cho-Jui and Chang, Kai-Wei and Huang, Xuanjing},
booktitle = {ACL},
year = {2021}
}

Details
• Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation

Chong Zhang, Jieyu Zhao, Huan Zhang, Kai-Wei Chang, and Cho-Jui Hsieh, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Robustness and counterfactual bias are usually evaluated on a test dataset. However, are these evaluations robust? If the test dataset is perturbed slightly, will the evaluation results keep the same? In this paper, we propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset. The framework first perturbs the test dataset to construct abundant natural sentences similar to the test data, and then diagnoses the prediction change regarding a single-word substitution. We apply this framework to study two perturbation-based approaches that are used to analyze models’ robustness and counterfactual bias in English. (1) For robustness, we focus on synonym substitutions and identify vulnerable examples where prediction can be altered. Our proposed attack attains high success rates (96.0%-99.8%) in finding vulnerable examples on both original and robustly trained CNNs and Transformers. (2) For counterfactual bias, we focus on substituting demographic tokens (e.g., gender, race) and measure the shift of the expected prediction among constructed sentences. Our method is able to reveal the hidden model biases not directly shown in the test dataset.
@inproceedings{zhang2021double,
title = {	Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation},
booktitle = {NAACL},
author = {Zhang, Chong and Zhao, Jieyu and Zhang, Huan and Chang, Kai-Wei and Hsieh, Cho-Jui},
year = {2021},
presentation_id = {https://underline.io/events/122/sessions/4229/lecture/19609-double-perturbation-on-the-robustness-of-robustness-and-counterfactual-bias-evaluation}
}

Details
• Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh, in NeurIPS, 2020.
Full Text Code Abstract BibTeX Details
Linear relaxation based perturbation analysis (LiRPA) for neural networks, which computes provable linear bounds of output neurons given a certain amount of input perturbation, has become a core component in robustness verification and certified defense. The majority of LiRPA-based methods only consider simple feed-forward networks and it needs particular manual derivations and implementations when extended to other architectures. In this paper, we develop an automatic framework to enable perturbation analysis on any neural network structures, by generalizing exiting LiRPA algorithms such as CROWN to operate on general computational graphs. The flexibility, differentiability and ease of use of our framework allow us to obtain state-of-the-art results on LiRPA based certified defense on fairly complicated networks like DenseNet, ResNeXt and Transformer that are not supported by prior work. Our framework also enables loss fusion, a technique that significantly reduces the computational complexity of LiRPA for certified defense. For the first time, we demonstrate LiRPA based certified defense on Tiny ImageNet and Downscaled ImageNet where previous approaches cannot scale to due to the relatively large number of classes. Our work also yields an open-source library for the community to apply LiRPA to areas beyond certified defense without much LiRPA expertise, e.g., we create a neural network with a provably flat optimization landscape. Our open source library is available at https://github.com/KaidiXu/auto_LiRPA
@inproceedings{xu2020provable,
author = {Xu, Kaidi and Shi, Zhouxing and Zhang, Huan and Wang, Yihan and Chang, Kai-Wei and Huang, Minlie and Kailkhura, Bhavya and Lin, Xue and Hsieh, Cho-Jui},
title = {Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs},
booktitle = {NeurIPS},
year = {2020}
}

Details
• On the Robustness of Language Encoders against Grammatical Errors

Fan Yin, Quanyu Long, Tao Meng, and Kai-Wei Chang, in ACL, 2020.
Full Text Slides Video Code Abstract BibTeX Details
We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors.
@inproceedings{yin2020robustness,
author = {Yin, Fan and Long, Quanyu and Meng, Tao and Chang, Kai-Wei},
title = {On the Robustness of Language Encoders against Grammatical Errors},
booktitle = {ACL},
presentation_id = {https://virtual.acl2020.org/paper_main.310.html},
year = {2020}
}

Details
• Robustness Verification for Transformers

Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, and Cho-Jui Hsieh, in ICLR, 2020.
Full Text Video Code Abstract BibTeX Details
Robustness verification that aims to formally certify the prediction behavior of
neural networks has become an important tool for understanding the behavior of
a given model and for obtaining safety guarantees. However, previous methods
are usually limited to relatively simple neural networks. In this paper, we consider the robustness verification problem for Transformers. Transformers have
complex self-attention layers that pose many challenges for verification, including
cross-nonlinearity and cross-position dependency, which have not been discussed
in previous work. We resolve these challenges and develop the first verification
algorithm for Transformers. The certified robustness bounds computed by our
method are significantly tighter than those by naive Interval Bound Propagation.
These bounds also shed light on interpreting Transformers as they consistently
reflect the importance of words in sentiment analysis.
@inproceedings{shi2020robustness,
author = {Shi, Zhouxing and Zhang, Huan and Chang, Kai-Wei and Huang, Minlie and Hsieh, Cho-Jui},
title = {Robustness Verification for Transformers},
booktitle = {ICLR},
year = {2020}
}

Details
• Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification

Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, and Wei Wang, in EMNLP, 2019.
Full Text Code Abstract BibTeX Details
Adversarial attacks against machine learning models have threatened various real-world applications such as spam filtering and sentiment analysis. In this paper, we propose a novel framework, learning to DIScriminate Perturbations (DISP), to identify and adjust malicious perturbations, thereby blocking adversarial attacks for text classification models. To identify adversarial attacks, a perturbation discriminator validates how likely a token in the text is perturbed and provides a set of potential perturbations. For each potential perturbation, an embedding estimator learns to restore the embedding of the original word based on the context and a replacement token is chosen based on approximate kNN search. DISP can block adversarial attacks for any NLP model without modifying the model structure or training procedure. Extensive experiments on two benchmark datasets demonstrate that DISP significantly outperforms baseline methods in blocking adversarial attacks for text classification. In addition, in-depth analysis shows the robustness of DISP across different situations.
@inproceedings{zhou2019learning,
author = {Zhou, Yichao and Jiang, Jyun-Yu and Chang, Kai-Wei and Wang, Wei},
title = {Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification},
booktitle = {EMNLP},
year = {2019}
}

Details
• Retrofitting Contextualized Word Embeddings with Paraphrases

Weijia Shi, Muhao Chen, Pei Zhou, and Kai-Wei Chang, in EMNLP (short), 2019.
Full Text Slides Video Code Abstract BibTeX Details
Contextualized word embedding models, such as ELMo, generate meaningful representations of words and their context. These models have been shown to have a great impact on downstream applications. However, in many cases, the contextualized embedding of a word changes drastically when the context is paraphrased. As a result, the downstream model is not robust to paraphrasing and other linguistic variations. To enhance the stability of contextualized word embedding models, we propose an approach to retrofitting contextualized embedding models with paraphrase contexts. Our method learns an orthogonal transformation on the input space, which seeks to minimize the variance of word representations on paraphrased contexts. Experiments show that the retrofitted model significantly outperforms the original ELMo on various sentence classification and language inference tasks.
@inproceedings{shi2019retrofitting,
author = {Shi, Weijia and Chen, Muhao and Zhou, Pei and Chang, Kai-Wei},
title = {Retrofitting Contextualized Word Embeddings with Paraphrases},
booktitle = {EMNLP (short)},
vimeo_id = {430797636},
year = {2019}
}

Details
• Generating Natural Language Adversarial Examples

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang, in EMNLP (short), 2018.
Full Text Code Abstract BibTeX Details
Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the network to misclassify. In the image domain, these perturbations can often be made virtually indistinguishable to human perception, causing humans and state-of-the-art models to disagree. However, in the natural language domain, small perturbations are clearly perceptible, and the replacement of a single word can drastically alter the semantics of the document. Given these challenges, we use a population-based optimization algorithm to generate semantically and syntactically similar adversarial examples. We demonstrate via a human study that 94.3% of the generated examples are classified to the original label by human evaluators, and that the examples are perceptibly quite similar. We hope our findings encourage researchers to pursue improving the robustness of DNNs in the natural language domain.
@inproceedings{alzanto2018generating,
author = {Alzantot, Moustafa and Sharma, Yash and Elgohary, Ahmed and Ho, Bo-Jhang and Srivastava, Mani and Chang, Kai-Wei},
title = {Generating Natural Language Adversarial Examples},
booktitle = {EMNLP (short)},
year = {2018}
}

Details

Details

• Improving Pre-trained Vision-and-Language Embeddings for Phrase Grounding

Zi-Yi Dou and Nanyun Peng, in EMNLP, 2021.
QA Sessions: VIRTUAL POSTER SESSION II: SPEECH, VISION, ROBOTICS, MULTIMODAL GROUNDING Paper link in the virtual conference
BibTeX Details
@inproceedings{dou2021improving,
title = {Improving Pre-trained Vision-and-Language Embeddings for Phrase Grounding},
author = {Dou, Zi-Yi and Peng, Nanyun},
booktitle = {EMNLP},
presentation_id = {https://underline.io/events/192/posters/8255/poster/37595-improving-pre-trained-vision-and-language-embeddings-for-phrase-grounding},
year = {2021}
}


• Improving Pre-trained Vision-and-Language Embeddings for Phrase Grounding

Zi-Yi Dou and Nanyun Peng, in EMNLP, 2021.
BibTeX Details
@inproceedings{dou2021improving,
title = {Improving Pre-trained Vision-and-Language Embeddings for Phrase Grounding},
author = {Dou, Zi-Yi and Peng, Nanyun},
booktitle = {EMNLP},
presentation_id = {https://underline.io/events/192/posters/8255/poster/37595-improving-pre-trained-vision-and-language-embeddings-for-phrase-grounding},
year = {2021}
}

Details

Details
• Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

Da Yin, Liunian Harold Li, Ziniu Hu, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2021.
QA Sessions: 4F: SPEECH, VISION, ROBOTICS, MULTIMODAL GROUNDING 1 Paper link in the virtual conference
Full Text Code BibTeX Details
Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models’ ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard multimodal commonsense benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition.
@inproceedings{yin2021broaden,
title = {	Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning},
author = {Yin, Da and Li, Liunian Harold and Hu, Ziniu and Peng, Nanyun and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2021}
}


• AVATAR: A Parallel Corpus for Java-Python Program Translation

Wasi Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang, in Arxiv, 2021.
Full Text Code Abstract BibTeX Details
Program translation refers to migrating source code from one programming language to another. It has a tremendous practical value in software development as porting software across different languages is time-consuming and costly. Automating program translation is of paramount importance in software migration, and recently researchers explored unsupervised approaches due to the unavailability of parallel corpora. However, the availability of pre-trained language models for programming languages enable supervised fine-tuning with a small amount of labeled examples. In this work, we present a corpus of 8,475 programming problems and their solutions written in two popular languages, Java and Python. We collect the dataset from competitive programming sites, online platforms, and open source repositories. We present several baselines, including models trained from scratch or pre-trained on large-scale source code collection and fine-tuned on our proposed dataset. Experiment results show that while the models perform relatively well in terms of the lexical match, they lack in generating code that is accurate in terms of syntax and data-flow match.
@inproceedings{ahmad2021avatar,
title = {AVATAR: A Parallel Corpus for Java-Python Program Translation},
author = {Ahmad, Wasi and Tushar, Md Golam Rahman and Chakraborty, Saikat and Chang, Kai-Wei},
booktitle = {Arxiv},
year = {2021}
}

Details
• Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

Da Yin, Liunian Harold Li, Ziniu Hu, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2021.
Full Text Code Abstract BibTeX Details
Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models’ ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard multimodal commonsense benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition.
@inproceedings{yin2021broaden,
title = {	Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning},
author = {Yin, Da and Li, Liunian Harold and Hu, Ziniu and Peng, Nanyun and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2021}
}

Details
• Retrieval Augmented Code Generation and Summarization

Md Rizwan Parvez, Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in EMNLP-Finding, 2021.
Full Text Abstract BibTeX Details
Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers’ code or summary generation behavior, we propose a retrieval augmented framework, \tool, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. \tool has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.
@inproceedings{parvez2021retrieval,
title = {Retrieval Augmented Code Generation and Summarization},
author = {Parvez, Md Rizwan and Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
booktitle = {EMNLP-Finding},
presentation_id = {https://underline.io/events/192/sessions/7923/lecture/38314-retrieval-augmented-code-generation-and-summarization},
year = {2021}
}

Details
• Unified Pre-training for Program Understanding and Generation

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Code summarization nd generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART’s effectiveness in program understanding. Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow (e.g., if block inside an else block is equivalent to else if block) that are crucial to program semantics and thus excels even with limited annotations.
@inproceedings{ahmad2021unified,
title = {Unified Pre-training for Program Understanding and Generation},
author = {Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
booktitle = {NAACL},
presentation_id = {https://underline.io/events/122/sessions/4197/lecture/20024-unified-pre-training-for-program-understanding-and-generation},
year = {2021}
}

Details
• Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions

Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, and Kai-Wei Chang, in NAACL, 2021.
Full Text Video Abstract BibTeX Details
Pre-trained contextual vision-and-language (V&L) models have brought impressive performance improvement on various benchmarks. However, the paired text-image data required for pre-training are hard to collect and scale up. We investigate if a strong V&L representation model can be learned without text-image pairs. We propose Weakly-supervised VisualBERT with the key idea of conducting "mask-and-predict" pre-training on language-only and image-only corpora. Additionally, we introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. Evaluation on four V&L benchmarks shows that Weakly-supervised VisualBERT achieves similar performance with a model pre-trained with paired data. Besides, pre-training on more image-only data further improves a model that already has access to aligned data, suggesting the possibility of utilizing billions of raw images available to enhance V&L models.
@inproceedings{li2021unsupervised,
author = {Li, Liunian Harold and You, Haoxuan and Wang, Zhecan and Zareian, Alireza and Chang, Shih-Fu and Chang, Kai-Wei},
title = {Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions},
booktitle = {NAACL},
presentation_id = {https://underline.io/events/122/sessions/4269/lecture/19725-unsupervised-vision-and-language-pre-training-without-parallel-images-and-captions},
year = {2021}
}

Details
• What Does BERT with Vision Look At?

Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in ACL (short), 2020.
Full Text Slides Video Code Abstract BibTeX Details
Pre-trained visually grounded language models such as ViLBERT, LXMERT, and UNITER have achieved significant performance improvement on vision-and-language tasks but what they learn during pre-training remains unclear. In this work, we demonstrate that certain attention heads of a visually grounded language model actively ground elements of language to image regions. Specifically, some heads can map entities to image regions, performing the task known as entity grounding. Some heads can even detect the syntactic relations between non-entity words and image regions, tracking, for example, associations between verbs and regions corresponding to their arguments. We denote this ability as \emphsyntactic grounding. We verify grounding both quantitatively and qualitatively, using Flickr30K Entities as a testbed.
@inproceedings{li2020what,
author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
title = {What Does BERT with Vision Look At?},
booktitle = {ACL (short)},
presentation_id = {https://virtual.acl2020.org/paper_main.469.html},
year = {2020}
}

Details
• VisualBERT: A Simple and Performant Baseline for Vision and Language

Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in Arxiv, 2019.
Full Text Code Abstract BibTeX Details
We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.
@inproceedings{li2019visualbert,
author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
title = {VisualBERT: A Simple and Performant Baseline for Vision and Language},
booktitle = {Arxiv},
year = {2019}
}

Details

Details
• Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

Kuan-Hao Huang, Wasi Ahmad, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2021.
QA Sessions: 3G - IN PERSON POSTER SESSION Paper link in the virtual conference
Full Text Code BibTeX Details
Pre-trained multilingual language encoders, such as multilingual BERT and XLM-R, show great potential for zero-shot cross-lingual transfer. However, these multilingual encoders do not precisely align words and phrases across languages. Especially, learning alignments in the multilingual embedding space usually requires sentence-level or word-level parallel corpora, which are expensive to be obtained for low-resource languages. An alternative is to make the multilingual encoders more robust; when fine-tuning the encoder using downstream task, we train the encoder to tolerate noise in the contextual embedding spaces such that even if the representations of different languages are not aligned well, the model can still achieve good performance on zero-shot cross-lingual transfer. In this work, we propose a learning strategy for training robust models by drawing connections between adversarial examples and the failure cases of zero-shot cross-lingual transfer. We adopt two widely used robust training methods, adversarial training and randomized smoothing, to train the desired robust model. The experimental results demonstrate that robust training improves zero-shot cross-lingual transfer on text classification tasks. The improvement is more significant in the generalized cross-lingual transfer setting, where the pair of input sentences belong to two different languages.
@inproceedings{huang2021improving,
title = {Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training},
author = {Huang, Kuan-Hao and Ahmad, Wasi and Peng, Nanyun and Chang, Kai-Wei},
presentation_id = {https://underline.io/events/192/posters/7783/poster/40656-improving-zero-shot-cross-lingual-transfer-learning-via-robust-training},
booktitle = {EMNLP},
year = {2021}
}


• Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

Kuan-Hao Huang, Wasi Ahmad, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2021.
Full Text Code Abstract BibTeX Details
Pre-trained multilingual language encoders, such as multilingual BERT and XLM-R, show great potential for zero-shot cross-lingual transfer. However, these multilingual encoders do not precisely align words and phrases across languages. Especially, learning alignments in the multilingual embedding space usually requires sentence-level or word-level parallel corpora, which are expensive to be obtained for low-resource languages. An alternative is to make the multilingual encoders more robust; when fine-tuning the encoder using downstream task, we train the encoder to tolerate noise in the contextual embedding spaces such that even if the representations of different languages are not aligned well, the model can still achieve good performance on zero-shot cross-lingual transfer. In this work, we propose a learning strategy for training robust models by drawing connections between adversarial examples and the failure cases of zero-shot cross-lingual transfer. We adopt two widely used robust training methods, adversarial training and randomized smoothing, to train the desired robust model. The experimental results demonstrate that robust training improves zero-shot cross-lingual transfer on text classification tasks. The improvement is more significant in the generalized cross-lingual transfer setting, where the pair of input sentences belong to two different languages.
@inproceedings{huang2021improving,
title = {Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training},
author = {Huang, Kuan-Hao and Ahmad, Wasi and Peng, Nanyun and Chang, Kai-Wei},
presentation_id = {https://underline.io/events/192/posters/7783/poster/40656-improving-zero-shot-cross-lingual-transfer-learning-via-robust-training},
booktitle = {EMNLP},
year = {2021}
}

Details
• Syntax-augmented Multilingual BERT for Cross-lingual Transfer

Wasi Ahmad, Haoran Li, Kai-Wei Chang, and Yashar Mehdad, in ACL, 2021.
Full Text Video Code Abstract BibTeX Details
In recent years, we have seen a colossal effort
in pre-training multilingual text encoders using large-scale corpora in many languages to
facilitate cross-lingual transfer learning. However, due to typological differences across languages, the cross-lingual transfer is challenging. Nevertheless, language syntax, e.g., syntactic dependencies, can bridge the typological gap. Previous works have shown that pretrained multilingual encoders, such as mBERT
(Devlin et al., 2019), capture language syntax, helping cross-lingual transfer. This work
shows that explicitly providing language syntax and training mBERT using an auxiliary
objective to encode the universal dependency
tree structure helps cross-lingual transfer. We
perform rigorous experiments on four NLP
tasks, including text classification, question answering, named entity recognition, and taskoriented semantic parsing. The experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks, such as PAWS-X and MLQA, by 1.4
and 1.6 points on average across all languages.
In the generalized transfer setting, the performance boosted significantly, with 3.9 and 3.1
points on average in PAWS-X and MLQA.
@inproceedings{ahmad2021syntax,
title = {Syntax-augmented Multilingual BERT for Cross-lingual Transfer},
author = {Ahmad, Wasi and Li, Haoran and Chang, Kai-Wei and Mehdad, Yashar},
booktitle = {ACL},
year = {2021}
}

Details
• Evaluating the Values of Sources in Transfer Learning

Md Rizwan Parvez and Kai-Wei Chang, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Transfer learning that adapts a model trained on data-rich sources to low-resource targets has been widely applied in natural language processing (NLP). However, when training a transfer model over multiple sources, not every source is equally useful for the target. To better transfer a model, it is essential to understand the values of the sources. In this paper, we develop SEAL-Shap, an efficient source valuation framework for quantifying the usefulness of the sources (e.g., domains/languages) in transfer learning based on the Shapley value method. Experiments and comprehensive analyses on both cross-domain and cross-lingual transfers demonstrate that our framework is not only effective in choosing useful transfer sources but also the source values match the intuitive source-target similarity.
@inproceedings{parvez2021evaluating,
title = {Evaluating the Values of Sources in Transfer Learning},
author = {Parvez, Md Rizwan and Chang, Kai-Wei},
booktitle = {NAACL},
presentation_id = {https://underline.io/events/122/sessions/4261/lecture/19707-evaluating-the-values-of-sources-in-transfer-learning},
year = {2021}
}

Details
• GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction

Wasi Ahmad, Nanyun Peng, and Kai-Wei Chang, in AAAI, 2021.
Full Text Code Abstract BibTeX Details
Prevalent approaches in cross-lingual relation and event extraction use graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic representations such that models trained on one language can be applied to other languages. However, GCNs lack in modeling long-range dependencies or disconnected words in the dependency tree. To address this challenge, we propose to utilize the self-attention mechanism where we explicitly fuse structural information to learn the dependencies between words at different syntactic distances. We introduce GATE, a \bf Graph \bf Attention \bf Transformer \bf Encoder, and test its cross-lingual transferability on relation and event extraction tasks. We perform rigorous experiments on the widely used ACE05 dataset that includes three typologically different languages: English, Chinese, and Arabic. The evaluation results show that GATE outperforms three recently proposed methods by a large margin. Our detailed analysis reveals that due to the reliance on syntactic dependencies, GATE produces robust representations that facilitate transfer across languages.
@inproceedings{ahmad2021gate,
author = {Ahmad, Wasi and Peng, Nanyun and Chang, Kai-Wei},
title = {GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction},
booktitle = {AAAI},
year = {2021}
}

Details
• Cross-Lingual Dependency Parsing by POS-Guided Word Reordering

Lu Liu, Yi Zhou, Jianhan Xu, Xiaoqing Zheng, Kai-Wei Chang, and Xuanjing Huang, in EMNLP-Finding, 2020.
Full Text Abstract BibTeX Details
We propose a novel approach to cross-lingual dependency parsing based on word reordering. The words in each sentence of a source language corpus are rearranged to meet the word order in a target language under the guidance of a part-of-speech based language model (LM). To obtain the highest reordering score under the LM, a population-based optimization algorithm and its genetic operators are designed to deal with the combinatorial nature of such word reordering. A parser trained on the reordered corpus then can be used to parse sentences in the target language. We demonstrate through extensive experimentation that our approach achieves better or comparable results across 25 target languages (1.73% increase in average), and outperforms a baseline by a significant margin on the languages that are greatly different from the source one. For example, when transferring the English parser to Hindi and Latin, our approach outperforms the baseline by 15.3% and 6.7% respectively.
@inproceedings{liu2020cross-lingual,
author = {Liu, Lu and Zhou, Yi and Xu, Jianhan and Zheng, Xiaoqing and Chang, Kai-Wei and Huang, Xuanjing},
title = {Cross-Lingual Dependency Parsing by POS-Guided Word Reordering},
booktitle = {EMNLP-Finding},
year = {2020}
}

Details
• Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages

Wasi Ahmad, Zhisong Zhang, Xuezhe Ma, Kai-Wei Chang, and Nanyun Peng, in CoNLL, 2019.
Full Text Poster Code Abstract BibTeX Details
Cross-lingual transfer learning has become an important weapon to battle the unavailability of annotated resources for low-resource languages.  One of the fundamental techniques to transfer across languages is learning language-agnostic representations, in the form of word embeddings or contextual encodings. In this work, we propose to leverage unannotated sentences from auxiliary languages to help learning language-agnostic representations  Specifically, we explore adversarial training for learning contextual encoders that produce invariant representations across languages to facilitate cross-lingual transfer. We conduct experiments on cross-lingual dependency parsing where we train a dependency parser on a source language and transfer it to a wide range of target languages.  Experiments on 28 target languages demonstrate that adversarial training significantly improves the overall transfer performances under several different settings.  We conduct a careful analysis to evaluate the language-agnostic representations resulted from adversarial training.
@inproceedings{ahmad2019crosslingual,
author = {Ahmad, Wasi and Zhang, Zhisong and Ma, Xuezhe and Chang, Kai-Wei and Peng, Nanyun},
title = {  Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages},
booktitle = {CoNLL},
year = {2019}
}

Details
• Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing

Tao Meng, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2019.
Full Text Poster Code Abstract BibTeX Details
Prior work on cross-lingual dependency parsing often focuses on capturing the commonalities between source and target languages and overlooks the potential of leveraging linguistic properties of the languages to facilitate the transfer. In this paper, we show that weak supervisions of linguistic knowledge for the target languages can improve a cross-lingual graph-based dependency parser substantially. Specifically, we explore several types of corpus linguistic statistics and compile them into corpus-wise constraints to guide the inference process during the test time. We adapt two techniques, Lagrangian relaxation and posterior regularization, to conduct inference with corpus-statistics constraints. Experiments show that the Lagrangian relaxation and posterior regularization inference improve the performances on 15 and 17 out of 19 target languages, respectively. The improvements are especially significant for target languages that have different word order features from the source language.
@inproceedings{meng2019target,
author = {Meng, Tao and Peng, Nanyun and Chang, Kai-Wei},
title = {Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing},
booktitle = {EMNLP},
year = {2019}
}

Details
• On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, and Nanyun Peng, in NAACL, 2019.
Full Text Video Code Abstract BibTeX Details
Different languages might have different wordorders. In this paper, we investigate cross-lingual transfer and posit that an order-agnostic model will perform better when trans-ferring to distant foreign languages. To test ourhypothesis, we train dependency parsers on anEnglish corpus and evaluate their transfer per-formance on 30 other languages. Specifically,we compare encoders and decoders based onRecurrent Neural Networks (RNNs) and mod-ified self-attentive architectures. The formerrelies on sequential information while the lat-ter is more flexible at modeling word order.Rigorous experiments and detailed analysisshows that RNN-based architectures transferwell to languages that are close to English,while self-attentive models have better overallcross-lingual transferability and perform espe-cially well on distant languages.
@inproceedings{ahmad2019difficulties,
author = {Ahmad, Wasi Uddin and Zhang, Zhisong and Ma, Xuezhe and Hovy, Eduard and Chang, Kai-Wei and Peng, Nanyun},
title = {On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing},
booktitle = {NAACL},
year = {2019}
}

Details

Details
• Retrieval Augmented Code Generation and Summarization

Md Rizwan Parvez, Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in EMNLP-Finding, 2021.
QA Sessions: FINDINGS PAPERS - GENERATION Paper link in the virtual conference
Full Text BibTeX Details
Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers’ code or summary generation behavior, we propose a retrieval augmented framework, \tool, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. \tool has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.
@inproceedings{parvez2021retrieval,
title = {Retrieval Augmented Code Generation and Summarization},
author = {Parvez, Md Rizwan and Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
booktitle = {EMNLP-Finding},
presentation_id = {https://underline.io/events/192/sessions/7923/lecture/38314-retrieval-augmented-code-generation-and-summarization},
year = {2021}
}


• AVATAR: A Parallel Corpus for Java-Python Program Translation

Wasi Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang, in Arxiv, 2021.
Full Text Code Abstract BibTeX Details
Program translation refers to migrating source code from one programming language to another. It has a tremendous practical value in software development as porting software across different languages is time-consuming and costly. Automating program translation is of paramount importance in software migration, and recently researchers explored unsupervised approaches due to the unavailability of parallel corpora. However, the availability of pre-trained language models for programming languages enable supervised fine-tuning with a small amount of labeled examples. In this work, we present a corpus of 8,475 programming problems and their solutions written in two popular languages, Java and Python. We collect the dataset from competitive programming sites, online platforms, and open source repositories. We present several baselines, including models trained from scratch or pre-trained on large-scale source code collection and fine-tuned on our proposed dataset. Experiment results show that while the models perform relatively well in terms of the lexical match, they lack in generating code that is accurate in terms of syntax and data-flow match.
@inproceedings{ahmad2021avatar,
title = {AVATAR: A Parallel Corpus for Java-Python Program Translation},
author = {Ahmad, Wasi and Tushar, Md Golam Rahman and Chakraborty, Saikat and Chang, Kai-Wei},
booktitle = {Arxiv},
year = {2021}
}

Details
• Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

Da Yin, Liunian Harold Li, Ziniu Hu, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2021.
Full Text Code Abstract BibTeX Details
Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models’ ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard multimodal commonsense benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition.
@inproceedings{yin2021broaden,
title = {	Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning},
author = {Yin, Da and Li, Liunian Harold and Hu, Ziniu and Peng, Nanyun and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2021}
}

Details
• Retrieval Augmented Code Generation and Summarization

Md Rizwan Parvez, Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in EMNLP-Finding, 2021.
Full Text Abstract BibTeX Details
Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers’ code or summary generation behavior, we propose a retrieval augmented framework, \tool, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. \tool has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.
@inproceedings{parvez2021retrieval,
title = {Retrieval Augmented Code Generation and Summarization},
author = {Parvez, Md Rizwan and Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
booktitle = {EMNLP-Finding},
presentation_id = {https://underline.io/events/192/sessions/7923/lecture/38314-retrieval-augmented-code-generation-and-summarization},
year = {2021}
}

Details
• Unified Pre-training for Program Understanding and Generation

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in NAACL, 2021.
Full Text Video Code Abstract BibTeX Details
Code summarization nd generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART’s effectiveness in program understanding. Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow (e.g., if block inside an else block is equivalent to else if block) that are crucial to program semantics and thus excels even with limited annotations.
@inproceedings{ahmad2021unified,
title = {Unified Pre-training for Program Understanding and Generation},
author = {Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
booktitle = {NAACL},
presentation_id = {https://underline.io/events/122/sessions/4197/lecture/20024-unified-pre-training-for-program-understanding-and-generation},
year = {2021}
}

Details
• Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions

Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, and Kai-Wei Chang, in NAACL, 2021.
Full Text Video Abstract BibTeX Details
Pre-trained contextual vision-and-language (V&L) models have brought impressive performance improvement on various benchmarks. However, the paired text-image data required for pre-training are hard to collect and scale up. We investigate if a strong V&L representation model can be learned without text-image pairs. We propose Weakly-supervised VisualBERT with the key idea of conducting "mask-and-predict" pre-training on language-only and image-only corpora. Additionally, we introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. Evaluation on four V&L benchmarks shows that Weakly-supervised VisualBERT achieves similar performance with a model pre-trained with paired data. Besides, pre-training on more image-only data further improves a model that already has access to aligned data, suggesting the possibility of utilizing billions of raw images available to enhance V&L models.
@inproceedings{li2021unsupervised,
author = {Li, Liunian Harold and You, Haoxuan and Wang, Zhecan and Zareian, Alireza and Chang, Shih-Fu and Chang, Kai-Wei},
title = {Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions},
booktitle = {NAACL},
presentation_id = {https://underline.io/events/122/sessions/4269/lecture/19725-unsupervised-vision-and-language-pre-training-without-parallel-images-and-captions},
year = {2021}
}

Details
• What Does BERT with Vision Look At?

Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in ACL (short), 2020.
Full Text Slides Video Code Abstract BibTeX Details
Pre-trained visually grounded language models such as ViLBERT, LXMERT, and UNITER have achieved significant performance improvement on vision-and-language tasks but what they learn during pre-training remains unclear. In this work, we demonstrate that certain attention heads of a visually grounded language model actively ground elements of language to image regions. Specifically, some heads can map entities to image regions, performing the task known as entity grounding. Some heads can even detect the syntactic relations between non-entity words and image regions, tracking, for example, associations between verbs and regions corresponding to their arguments. We denote this ability as \emphsyntactic grounding. We verify grounding both quantitatively and qualitatively, using Flickr30K Entities as a testbed.
@inproceedings{li2020what,
author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
title = {What Does BERT with Vision Look At?},
booktitle = {ACL (short)},
presentation_id = {https://virtual.acl2020.org/paper_main.469.html},
year = {2020}
}

Details
• VisualBERT: A Simple and Performant Baseline for Vision and Language

Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in Arxiv, 2019.
Full Text Code Abstract BibTeX Details
We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.
@inproceedings{li2019visualbert,
author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
title = {VisualBERT: A Simple and Performant Baseline for Vision and Language},
booktitle = {Arxiv},
year = {2019}
}

Details

Details

• Document-level Entity-based Extraction as Template Generation

Kung-Hsiang Huang, Sam Tang, and Nanyun Peng, in EMNLP, 2021.
QA Sessions: VIRTUAL POSTER SESSION II: INFORMATION EXTRACTION Paper link in the virtual conference
BibTeX Details
@inproceedings{huang2021tempgen,
title = {Document-level Entity-based Extraction as Template Generation},
author = {Huang, Kung-Hsiang and Tang, Sam and Peng, Nanyun},
booktitle = {EMNLP},
presentation_id = {https://underline.io/events/192/posters/8243/poster/37467-document-level-entity-based-extraction-as-template-generation},
year = {2021}
}


• Document-level Entity-based Extraction as Template Generation

Kung-Hsiang Huang, Sam Tang, and Nanyun Peng, in EMNLP, 2021.
BibTeX Details
@inproceedings{huang2021tempgen,
title = {Document-level Entity-based Extraction as Template Generation},
author = {Huang, Kung-Hsiang and Tang, Sam and Peng, Nanyun},
booktitle = {EMNLP},
presentation_id = {https://underline.io/events/192/posters/8243/poster/37467-document-level-entity-based-extraction-as-template-generation},
year = {2021}
}

Details

Details
• ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning

Rujun Han, I.-Hung Hsu, Jiao Sun, Julia Baylon, Qiang Ning, Dan Roth, and Nanyun Peng, in EMNLP, 2021.
QA Sessions: 7D: RESOURCES AND EVALUATION 3 Paper link in the virtual conference
Full Text Code BibTeX Details
@inproceedings{han2021ester,
title = {ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning},
author = {Han, Rujun and Hsu, I-Hung and Sun, Jiao and Baylon, Julia and Ning, Qiang and Roth, Dan and Peng, Nanyun},
booktitle = {EMNLP},
year = {2021}
}


• ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning

Rujun Han, I.-Hung Hsu, Jiao Sun, Julia Baylon, Qiang Ning, Dan Roth, and Nanyun Peng, in EMNLP, 2021.
Full Text Code BibTeX Details
@inproceedings{han2021ester,
title = {ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning},
author = {Han, Rujun and Hsu, I-Hung and Sun, Jiao and Baylon, Julia and Ning, Qiang and Roth, Dan and Peng, Nanyun},
booktitle = {EMNLP},
year = {2021}
}

Details
• ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning

Rujun Han, Xiang Ren, and Nanyun Peng, in EMNLP, 2021.
Full Text Code BibTeX Details
@inproceedings{han2021econet,
title = {ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning},
author = {Han, Rujun and Ren, Xiang and Peng, Nanyun},
booktitle = {EMNLP},
presentation_id = {https://underline.io/events/192/posters/8243/poster/37875-econet-effective-continual-pretraining-of-language-models-for-event-temporal-reasoning},
year = {2021}
}

Details
• EventPlus: A Temporal Event Understanding Pipeline

Mingyu Derek Ma, Jiao Sun, Mu Yang, Kung-Hsiang Huang, Nuan Wen, Shikhar Singh, Rujun Han, and Nanyun Peng, in 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Demonstrations Track, 2021.
Full Text Slides Poster Video Code Abstract BibTeX Details
We present EventPlus, a temporal event understanding pipeline that integrates various state-of-the-art event understanding components including event trigger and type detection, event argument detection, event duration and temporal relation extraction. Event information, especially event temporal knowledge, is a type of common sense knowledge that helps people understand how stories evolve and provides predictive hints for future events. EventPlus as the first comprehensive temporal event understanding pipeline provides a convenient tool for users to quickly obtain annotations about events and their temporal information for any user-provided document. Furthermore, we show EventPlus can be easily adapted to other domains (e.g., biomedical domain). We make EventPlus publicly available to facilitate event-related information extraction and downstream applications.
@inproceedings{ma2021eventplus,
title = {EventPlus: A Temporal Event Understanding Pipeline},
author = {Ma, Mingyu Derek and Sun, Jiao and Yang, Mu and Huang, Kung-Hsiang and Wen, Nuan and Singh, Shikhar and Han, Rujun and Peng, Nanyun},
booktitle = {2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Demonstrations Track},
presentation_id = {https://underline.io/events/122/posters/4227/poster/20582-eventplus-a-temporal-event-understanding-pipeline},
year = {2021}
}

Details

Details
• ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning

Rujun Han, Xiang Ren, and Nanyun Peng, in EMNLP, 2021.
QA Sessions: VIRTUAL POSTER SESSION II: INFORMATION EXTRACTION Paper link in the virtual conference
Full Text Code BibTeX Details
@inproceedings{han2021econet,
title = {ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning},
author = {Han, Rujun and Ren, Xiang and Peng, Nanyun},
booktitle = {EMNLP},
presentation_id = {https://underline.io/events/192/posters/8243/poster/37875-econet-effective-continual-pretraining-of-language-models-for-event-temporal-reasoning},
year = {2021}
}


• ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning

Rujun Han, I.-Hung Hsu, Jiao Sun, Julia Baylon, Qiang Ning, Dan Roth, and Nanyun Peng, in EMNLP, 2021.
Full Text Code BibTeX Details
@inproceedings{han2021ester,
title = {ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning},
author = {Han, Rujun and Hsu, I-Hung and Sun, Jiao and Baylon, Julia and Ning, Qiang and Roth, Dan and Peng, Nanyun},
booktitle = {EMNLP},
year = {2021}
}

Details
• ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning

Rujun Han, Xiang Ren, and Nanyun Peng, in EMNLP, 2021.
Full Text Code BibTeX Details
@inproceedings{han2021econet,
title = {ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning},
author = {Han, Rujun and Ren, Xiang and Peng, Nanyun},
booktitle = {EMNLP},
presentation_id = {https://underline.io/events/192/posters/8243/poster/37875-econet-effective-continual-pretraining-of-language-models-for-event-temporal-reasoning},
year = {2021}
}

Details
• EventPlus: A Temporal Event Understanding Pipeline

Mingyu Derek Ma, Jiao Sun, Mu Yang, Kung-Hsiang Huang, Nuan Wen, Shikhar Singh, Rujun Han, and Nanyun Peng, in 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Demonstrations Track, 2021.
Full Text Slides Poster Video Code Abstract BibTeX Details
We present EventPlus, a temporal event understanding pipeline that integrates various state-of-the-art event understanding components including event trigger and type detection, event argument detection, event duration and temporal relation extraction. Event information, especially event temporal knowledge, is a type of common sense knowledge that helps people understand how stories evolve and provides predictive hints for future events. EventPlus as the first comprehensive temporal event understanding pipeline provides a convenient tool for users to quickly obtain annotations about events and their temporal information for any user-provided document. Furthermore, we show EventPlus can be easily adapted to other domains (e.g., biomedical domain). We make EventPlus publicly available to facilitate event-related information extraction and downstream applications.
@inproceedings{ma2021eventplus,
title = {EventPlus: A Temporal Event Understanding Pipeline},
author = {Ma, Mingyu Derek and Sun, Jiao and Yang, Mu and Huang, Kung-Hsiang and Wen, Nuan and Singh, Shikhar and Han, Rujun and Peng, Nanyun},
booktitle = {2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Demonstrations Track},
presentation_id = {https://underline.io/events/122/posters/4227/poster/20582-eventplus-a-temporal-event-understanding-pipeline},
year = {2021}
}

Details

Details
• HyperExpan: Taxonomy Expansion with Hyperbolic Representation Learning

Mingyu Derek Ma, Muhao Chen, Te-Lin Wu, and Nanyun Peng, in EMNLP-Finding, 2021.
QA Sessions: FINDINGS PAPERS - SEMANTICS: LEXICAL, SENTENCE LEVEL, TEXTUAL INFERENCE AND OTHER AREAS Paper link in the virtual conference
Full Text Slides BibTeX Details
Taxonomies are valuable resources for many applications, but the limited coverage due to the expensive manual curation process hinders their general applicability. Prior works attempt to automatically expand existing taxonomies to improve their coverage by learning concept embeddings in Euclidean space, while taxonomies, inherently hierarchical, more naturally align with the geometric properties of a hyperbolic space. In this paper, we present HyperExpan, a taxonomy expansion algorithm that seeks to preserve the structure of a taxonomy in a more expressive hyperbolic embedding space and learn to represent concepts and their relations with a Hyperbolic Graph Neural Network (HGNN). Specifically, HyperExpan leverages position embeddings to exploit the structure of the existing taxonomies, and characterizes the concept profile information to support the inference on unseen concepts during training. Experiments show that our proposed HyperExpan outperforms baseline models with representation learning in a Euclidean feature space and achieves state-of-the-art performance on the taxonomy expansion benchmarks.
@inproceedings{ma2021hyperexpan,
title = {HyperExpan: Taxonomy Expansion with Hyperbolic Representation Learning},
author = {Ma, Mingyu Derek and Chen, Muhao and Wu, Te-Lin and Peng, Nanyun},
booktitle = {EMNLP-Finding},
year = {2021},
presentation_id = {https://underline.io/events/192/sessions/7934/lecture/38572-hyperexpan-taxonomy-expansion-with-hyperbolic-representation-learning}
}


Details
• Relation-Guided Pre-Training for Open-Domain Question Answering

Ziniu Hu, Yizhou Sun, and Kai-Wei Chang, in EMNLP-Finding, 2021.
QA Sessions: FINDINGS PAPERS - QUESTION ANSWERING Paper link in the virtual conference
Full Text BibTeX Details
Answering complex open-domain questions requires understanding the latent relations between involving entities. However, we found that the existing QA datasets are extremely imbalanced in some types of relations, which hurts the generalization performance over questions with long-tail relations. To remedy this problem, in this paper, we propose a Relation-Guided Pre-Training (RGPT-QA) framework. We first generate a relational QA dataset covering a wide range of relations from both the Wikidata triplets and Wikipedia hyperlinks. We then pre-train a QA model to infer the latent relations from the question, and then conduct extractive QA to get the target answer entity. We demonstrate that by pretraining with propoed RGPT-QA techique, the popular open-domain QA model, Dense Passage Retriever (DPR), achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions. Particularly, we show that RGPT-QA improves significantly on questions with long-tail relations
@inproceedings{hu2021relation,
title = {Relation-Guided Pre-Training for Open-Domain Question Answering},
author = {Hu, Ziniu and Sun, Yizhou and Chang, Kai-Wei},
booktitle = {EMNLP-Finding},
year = {2021}
}


• DEGREE: A Data-Efficient Generative Event Extraction Model

I.-Hung Hsu, Kuan-Hao Huang, Elizabeth Boschee, Scott Miller, Prem Natarajan, Kai-Wei Chang, and Nanyun Peng, in Arxiv, 2021.
Full Text Abstract BibTeX Details
Event extraction (EE), the task that identifies event triggers and their arguments in text, is usually formulated as a classification or structured prediction problem. Such models usually reduce labels to numeric identifiers, making them unable to take advantage of label semantics (e.g. an event type named Arrest is related to words like arrest, detain, or apprehend). This prevents the generalization to new event types. In this work, we formulate EE as a natural language generation task and propose GenEE, a model that not only captures complex dependencies within an event but also generalizes well to unseen or rare event types. Given a passage and an event type, GenEE is trained to generate a natural sentence following a predefined template for that event type. The generated output is then decoded into trigger and argument predictions. The autoregressive generation process naturally models the dependencies among the predictions – each new word predicted depends on those already generated in the output sentence. Using carefully designed input prompts during generation, GenEE is able to capture label semantics, which enables the generalization to new event types. Empirical results show that our model achieves strong performance on event extraction tasks under all zero-shot, few-shot, and high-resource scenarios. Especially, in the high-resource setting, GenEE outperforms the state-of-the-art model on argument extraction and gets competitive results with the current best on end-to-end EE tasks.
@inproceedings{hsu2021degree,
title = {DEGREE: A Data-Efficient Generative Event Extraction Model},
author = {Hsu, I-Hung and Huang, Kuan-Hao and Boschee, Elizabeth and Miller, Scott and Natarajan, Prem and Chang, Kai-Wei and Peng, Nanyun},
booktitle = {Arxiv},
year = {2021}
}

Details
• Relation-Guided Pre-Training for Open-Domain Question Answering

Ziniu Hu, Yizhou Sun, and Kai-Wei Chang, in EMNLP-Finding, 2021.
Full Text Abstract BibTeX Details
Answering complex open-domain questions requires understanding the latent relations between involving entities. However, we found that the existing QA datasets are extremely imbalanced in some types of relations, which hurts the generalization performance over questions with long-tail relations. To remedy this problem, in this paper, we propose a Relation-Guided Pre-Training (RGPT-QA) framework. We first generate a relational QA dataset covering a wide range of relations from both the Wikidata triplets and Wikipedia hyperlinks. We then pre-train a QA model to infer the latent relations from the question, and then conduct extractive QA to get the target answer entity. We demonstrate that by pretraining with propoed RGPT-QA techique, the popular open-domain QA model, Dense Passage Retriever (DPR), achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions. Particularly, we show that RGPT-QA improves significantly on questions with long-tail relations
@inproceedings{hu2021relation,
title = {Relation-Guided Pre-Training for Open-Domain Question Answering},
author = {Hu, Ziniu and Sun, Yizhou and Chang, Kai-Wei},
booktitle = {EMNLP-Finding},
year = {2021}
}

Details
• An Integer Linear Programming Framework for Mining Constraints from Data

Tao Meng and Kai-Wei Chang, in ICML, 2021.
Full Text Video Code Abstract BibTeX Details
Various structured output prediction problems (e.g., sequential tagging) involve constraints over the output space. By identifying these constraints, we can filter out infeasible solutions and build an accountable model.
To this end, we present a general integer linear programming (ILP) framework for mining constraints from data. We model the inference of structured output prediction as an ILP problem. Then, given the coefficients of the objective function and the corresponding solution, we mine the underlying constraints by estimating the outer and inner polytopes of the feasible set. We verify the proposed constraint mining algorithm in various synthetic and real-world applications and demonstrate that the proposed approach successfully identifies the feasible set at scale.
In particular, we show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules. We also demonstrate results on hierarchical multi-label classification and conduct a theoretical analysis on how close the mined constraints are from the ground truth.
@inproceedings{meng2020integer,
author = {Meng, Tao and Chang, Kai-Wei},
title = {An Integer Linear Programming Framework for Mining Constraints from Data},
booktitle = {ICML},
year = {2021}
}

Details
• Intent Classification and Slot Filling for Privacy Policies

Wasi Ahmad, Jianfeng Chi, Tu Le, Thomas Norton, Yuan Tian, and Kai-Wei Chang, in ACL, 2021.
Full Text Video Code Abstract BibTeX Details
Understanding privacy policies is crucial for users as it empowers them to learn about the information that matters to them. Sentences written in a privacy policy document explain privacy practices, and the constituent text spans convey further specific information about that practice. We refer to predicting the privacy practice explained in a sentence as intent classification and identifying the text spans sharing specific information as slot filling. In this work, we propose PolicyIE, a corpus consisting of 5,250 intent and 11,788 slot annotations spanning 31 privacy policies of websites and mobile applications. PolicyIE corpus is a challenging benchmark with limited labeled examples reflecting the cost of collecting large-scale annotations. We present two alternative neural approaches as baselines: (1) formulating intent classification and slot filling as a joint sequence tagging and (2) modeling them as a sequence-to-sequence (Seq2Seq) learning task. Experiment results show that both approaches perform comparably in intent classification, while the Seq2Seq method outperforms the sequence tagging approach in slot filling by a large margin. Error analysis reveals the deficiency of the baseline approaches, suggesting room for improvement in future works. We hope the PolicyIE corpus will stimulate future research in this domain.
@inproceedings{ahmad2021intent,
title = {Intent Classification and Slot Filling for Privacy Policies},
author = {Ahmad, Wasi and Chi, Jianfeng and Le, Tu and Norton, Thomas and Tian, Yuan and Chang, Kai-Wei},
booktitle = {ACL},
year = {2021}
}

Details
• Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs

Kuan-Hao Huang and Kai-Wei Chang, in EACL, 2021.
Full Text Slides Poster Code Abstract BibTeX Details
Paraphrase generation plays an essential role in natural language process (NLP), and it has many downstream applications. However, training supervised paraphrase models requires many annotated paraphrase pairs, which are usually costly to obtain. On the other hand, the paraphrases generated by existing unsupervised approaches are usually syntactically similar to the source sentences and are limited in diversity. In this paper, we demonstrate that it is possible to generate syntactically various paraphrases without the need for annotated paraphrase pairs. We propose Syntactically controlled Paraphrase Generator (SynPG), an encoder-decoder based model that learns to disentangle the semantics and the syntax of a sentence from a collection of unannotated texts. The disentanglement enables SynPG to control the syntax of output paraphrases by manipulating the embedding in the syntactic space. Extensive experiments using automatic metrics and human evaluation show that SynPG performs better syntactic control than unsupervised baselines, while the quality of the generated paraphrases is competitive. We also demonstrate that the performance of SynPG is competitive or even better than supervised models when the unannotated data is large. Finally, we show that the syntactically controlled paraphrases generated by SynPG can be utilized for data augmentation to improve the robustness of NLP models.
@inproceedings{huang2021generating,
author = {Huang, Kuan-Hao and Chang, Kai-Wei},
title = {Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs},
booktitle = {EACL},
year = {2021}
}

Details
• Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference

Yichao Zhou, Yu Yan, Rujun Han, J. Harry Caufield, Kai-Wei Chang, Yizhou Sun, Peipei Ping, and Wei Wang, in AAAI, 2021.
Full Text Code Abstract BibTeX Details
There  has  been  a  steady  need  in  the  medical  community to  precisely  extract  the  temporal  relations  between  clinical events. In particular, temporal information can facilitate a variety of downstream applications such as case report retrieval and medical question answering. However, existing methods either require expensive feature engineering or are incapable of  modeling  the  global  relational  dependencies  among  theevents. In this paper, we propose Clinical Temporal Relation Exaction  with  Probabilistic  Soft  Logic  Regularization  and Global Inference (CTRL-PG), a novel method to tackle the problem at the document level. Extensive experiments on two benchmark datasets, I2B2-2012 and TB-Dense, demonstrate that CTRL-PG significantly  outperforms  baseline  methodsfor temporal relation extraction.
@inproceedings{zhou2021clinical,
author = {Zhou, Yichao and Yan, Yu and Han, Rujun and Caufield, J. Harry and Chang, Kai-Wei and Sun, Yizhou and Ping, Peipei and Wang, Wei},
title = {Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference},
booktitle = {AAAI},
year = {2021}
}

Details
• PolicyQA: A Reading Comprehension Dataset for Privacy Policies

Wasi Ahmad, Jianfeng Chi, Yuan Tian, and Kai-Wei Chang, in EMNLP-Finding (short), 2020.
Full Text Code Abstract BibTeX Details
Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
@inproceedings{ahmad2020policyqa,
author = {Ahmad, Wasi and Chi, Jianfeng and Tian, Yuan and Chang, Kai-Wei},
title = {PolicyQA: A Reading Comprehension Dataset for Privacy Policies},
booktitle = {EMNLP-Finding (short)},
year = {2020}
}

Details
• GPT-GNN: Generative Pre-Training of Graph Neural Networks

Ziniu Hu, Yuxiao Dong, Kuansan Wang, Kai-Wei Chang, and Yizhou Sun, in KDD, 2020.
Full Text Video Code Abstract BibTeX Details
Graph neural networks (GNNs) have been demonstrated to besuccessful in modeling graph-structured data. However, training GNNs requires abundant task-specific labeled data, which is often arduously expensive to obtain. One effective way to reduce labeling effort is to pre-train an expressive GNN model on unlabelled data with self-supervision and then transfer the learned knowledge to downstream models. In this paper, we present the GPT-GNN’s framework to initialize GNNs by generative pre-training. GPT-GNN introduces a self-supervised attributed graph generation task to pre-train a GNN,which allows the GNN to capture the intrinsic structural and semantic properties of the graph. We factorize the likelihood of graph generation into two components: 1) attribute generation, and 2) edgegeneration. By modeling both components, GPT-GNN captures the inherent dependency between node attributes and graph structure during the generative process. Comprehensive experiments on thebillion-scale academic graph and Amazon recommendation data demonstrate that GPT-GNN significantly outperforms state-of-the-art base GNN models without pre-training by up to 9.1% across different downstream tasks.
@inproceedings{hu2020gptgnn,
author = {Hu, Ziniu and Dong, Yuxiao and Wang, Kuansan and Chang, Kai-Wei and Sun, Yizhou},
title = {GPT-GNN: Generative Pre-Training of Graph Neural Networks},
booktitle = {KDD},
slide_url = {https://acbull.github.io/pdf/gpt.pptx},
year = {2020}
}

Details
• SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics

Da Yin, Tao Meng, and Kai-Wei Chang, in ACL, 2020.
Full Text Slides Video Code Abstract BibTeX Details
We propose SentiBERT, a variant of BERT that effectively captures compositional sentiment semantics. The model incorporates contextualized representation with binary constituency parse tree to capture semantic composition. Comprehensive experiments demonstrate that SentiBERT achieves competitive performance on phrase-level sentiment classification. We further demonstrate that the sentiment composition learned from the phrase-level annotations on SST can be transferred to other sentiment analysis tasks as well as related tasks, such as emotion classification tasks. Moreover, we conduct ablation studies and design visualization methods to understand SentiBERT. We show that SentiBERT is better than baseline approaches in capturing negation and the contrastive relation and model the compositional sentiment semantics.
@inproceedings{yin2020sentibert,
author = {Yin, Da and Meng, Tao and Chang, Kai-Wei},
title = {SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics},
booktitle = {ACL},
year = {2020},
presentation_id = {https://virtual.acl2020.org/paper_main.341.html}
}

Details
• Building Language Models for Text with Named Entities

Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in ACL, 2018.
Full Text Poster Code Abstract BibTeX Details
Text in many domains involves a significant amount of named entities. Predicting the entity names is often challenging for a language model as they appear less frequent on the training corpus. In this paper, we propose a novel and effective approach to building a language model which can learn the entity names by leveraging their entity type information. We also introduce two benchmark datasets based on recipes and Java programming codes, on which we evaluate the proposed model. Experimental results show that our model achieves 52.2% better perplexity in recipe generation and 40.3% on code generation than state-of-the-art language models.
@inproceedings{parvez2018building,
author = {Parvez, Md Rizwan and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
title = {Building Language Models for Text with Named Entities},
booktitle = {ACL},
year = {2018}
}

Details
• Learning from Explicit and Implicit Supervision Jointly For Algebra Word Problems

Shyam Upadhyay, Ming-Wei Chang, Kai-Wei Chang, and Wen-tau Yih, in EMNLP, 2016.
Full Text Abstract BibTeX Details
Automatically solving algebra word problems has raised considerable interest recently. Existing state-of-the-art approaches mainly rely on learning from human annotated equations. In this paper, we demonstrate that it is possible to efficiently mine algebra problems and their numerical solutions with little to no manual effort. To leverage the mined dataset, we propose a novel structured-output learning algorithm that aims to learn from both explicit (e.g., equations) and implicit (e.g., solutions) supervision signals jointly. Enabled by this new algorithm, our model gains 4.6% absolute improvement in accuracy on the ALG-514 benchmark compared to the one without using implicit supervision. The final model also outperforms the current state-of-the-art approach by 3%.
Dataset
@inproceedings{BCWS16,
author = {Upadhyay, Shyam and Chang, Ming-Wei and Chang, Kai-Wei and Yih, Wen-tau},
title = {Learning from Explicit and Implicit Supervision Jointly For Algebra Word Problems},
booktitle = {EMNLP},
year = {2016}
}

Details

Details