At UCLA-NLP, our mission is to develop reliable, fair, accountable, robust natural language understanding and generation technology to benefit everyone.
Please see our recent papers at
In the following, we will highlight our reseach papers at NAACL 2021 on the following topics:
- Fairness and Social NLP
- Language Generation
- (Multi-Modal) Represenation Learning
- Model Evaluation and Interpretation
- Event Extraction
Fairness and Social NLP
Language Generation
NLP Model Evaluation and Interpretation
Evaluating the Values of Sources in Transfer Learning
Md Rizwan Parvez and Kai-Wei Chang, in NAACL, 2021.
QA Sessions: 14C-ORAL: INTERPRETABILITY AND ANALYSIS OF MODELS FOR NLP Paper link in the virtual conferenceFull Text Code BibTeX DetailsDetails
Transfer learning that adapts a model trained on data-rich sources to low-resource targets has been widely applied in natural language processing (NLP). However, when training a transfer model over multiple sources, not every source is equally useful for the target. To better transfer a model, it is essential to understand the values of the sources. In this paper, we develop SEAL-Shap, an efficient source valuation framework for quantifying the usefulness of the sources (e.g., domains/languages) in transfer learning based on the Shapley value method. Experiments and comprehensive analyses on both cross-domain and cross-lingual transfers demonstrate that our framework is not only effective in choosing useful transfer sources but also the source values match the intuitive source-target similarity.
@inproceedings{parvez2021evaluating, title = {Evaluating the Values of Sources in Transfer Learning}, author = {Parvez, Md Rizwan and Chang, Kai-Wei}, booktitle = {NAACL}, presentation_id = {https://underline.io/events/122/sessions/4261/lecture/19707-evaluating-the-values-of-sources-in-transfer-learning}, year = {2021} }When performing transfer learning with multiple sources, one key question is how much info one can leverage from each source. In #NAACL2021 paper, Rizwan Parvez @uclanlp developed SEAL-SHAP, an efficient source valuation framework for quantifying the usefulness of the sources 1/n pic.twitter.com/5qmAG7a1q7
— Kai-Wei Chang (@kaiwei_chang) June 5, 2021Related Publications
- LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs, ACL, 2026
- Contextual Label Projection for Cross-Lingual Structured Prediction, NAACL, 2024
- Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction, ACL, 2022
- Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training, EMNLP, 2021
- Syntax-augmented Multilingual BERT for Cross-lingual Transfer, ACL, 2021
- GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction, AAAI, 2021
- Cross-Lingual Dependency Parsing by POS-Guided Word Reordering, EMNLP-Finding, 2020
- Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages, CoNLL, 2019
- Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing, EMNLP, 2019
- On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing, NAACL, 2019
Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation
Chong Zhang, Jieyu Zhao, Huan Zhang, Kai-Wei Chang, and Cho-Jui Hsieh, in NAACL, 2021.
QA Sessions: 11B-ORAL: INTERPRETABILITY AND ANALYSIS OF MODELS FOR NLP Paper link in the virtual conferenceFull Text Code BibTeX DetailsDetails
Robustness and counterfactual bias are usually evaluated on a test dataset. However, are these evaluations robust? If the test dataset is perturbed slightly, will the evaluation results keep the same? In this paper, we propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset. The framework first perturbs the test dataset to construct abundant natural sentences similar to the test data, and then diagnoses the prediction change regarding a single-word substitution. We apply this framework to study two perturbation-based approaches that are used to analyze models’ robustness and counterfactual bias in English. (1) For robustness, we focus on synonym substitutions and identify vulnerable examples where prediction can be altered. Our proposed attack attains high success rates (96.0%-99.8%) in finding vulnerable examples on both original and robustly trained CNNs and Transformers. (2) For counterfactual bias, we focus on substituting demographic tokens (e.g., gender, race) and measure the shift of the expected prediction among constructed sentences. Our method is able to reveal the hidden model biases not directly shown in the test dataset.
@inproceedings{zhang2021double, title = { Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation}, booktitle = {NAACL}, author = {Zhang, Chong and Zhao, Jieyu and Zhang, Huan and Chang, Kai-Wei and Hsieh, Cho-Jui}, year = {2021}, presentation_id = {https://underline.io/events/122/sessions/4229/lecture/19609-double-perturbation-on-the-robustness-of-robustness-and-counterfactual-bias-evaluation} }Prior studies often test model robustness by applying semantic-invariant perturbation on a given test set. In our #NAACL2021 “Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation”, we propose a new framework for robustness verification. 1/n pic.twitter.com/h4V1dKhYXL
— Jieyu Zhao (@jieyuzhao11) June 5, 2021Related Publications
- VideoCon: Robust video-language alignment via contrast captions, CVPR, 2024
- CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning, ICCV, 2023
- Red Teaming Language Model Detectors with Language Models, TACL, 2023
- ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation, EMNLP, 2022
- Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers, EMNLP-Finding (short), 2022
- Unsupervised Syntactically Controlled Paraphrase Generation with Abstract Meaning Representations, EMNLP-Finding (short), 2022
- Improving the Adversarial Robustness of NLP Models by Information Bottleneck, ACL-Finding, 2022
- Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution, EMNLP, 2021
- On the Transferability of Adversarial Attacks against Neural Text Classifier, EMNLP, 2021
- Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble, ACL, 2021
- Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs, NeurIPS, 2020
- On the Robustness of Language Encoders against Grammatical Errors, ACL, 2020
- Robustness Verification for Transformers, ICLR, 2020
- Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification, EMNLP, 2019
- Retrofitting Contextualized Word Embeddings with Paraphrases, EMNLP (short), 2019
- Generating Natural Language Adversarial Examples, EMNLP (short), 2018
(Multi-Modal) Representation Learning
Unified Pre-training for Program Understanding and Generation
Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in NAACL, 2021.
QA Sessions: 8A-ORAL: MACHINE LEARNING FOR NLP: LANGUAGE MODELING AND SEQUENCE TO SEQUENCE MODELS Paper link in the virtual conferenceFull Text Code BibTeX Details Top-10 cited paper at NAACL 21DetailsCode summarization nd generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART’s effectiveness in program understanding. Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow (e.g., if block inside an else block is equivalent to else if block) that are crucial to program semantics and thus excels even with limited annotations.
@inproceedings{ahmad2021unified, title = {Unified Pre-training for Program Understanding and Generation}, author = {Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei}, booktitle = {NAACL}, presentation_id = {https://underline.io/events/122/sessions/4197/lecture/20024-unified-pre-training-for-program-understanding-and-generation}, year = {2021} }De-noising pretraining excels for dual modeling of programming language (e.g., source code) + natural language (e.g., code comment). See our new @NAACLHLT paper https://t.co/YrLFIJE1RH. Thanks to awesome collaborations by Wasi Ahmed, Saikat Chakraborty, @kaiwei_chang
— Baishakhi Ray (@baishakhir) March 13, 2021
.Related Publications
- AutoSUIT Bench - Automated Security UnIt Test Benchmark for LLM Coding, ACL-Findings, 2026
- METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling, ACL, 2025
- MQT-LLaVA: Matryoshka Query Transformer for Large Vision-Language Models, NeurIPS, 2024
- DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation, NeurIPS (Datasets and Benchmarks Track), 2024
- VDebugger: Harnessing Execution Feedback for Debugging Visual Programs, EMNLP-Finding, 2024
- AVATAR: A Parallel Corpus for Java-Python Program Translation, ACL-Finding (short), 2023
- Retrieval Augmented Code Generation and Summarization, EMNLP-Finding, 2021
Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models
James Y. Huang, Kuan-Hao Huang, and Kai-Wei Chang, in NAACL (short), 2021.
QA Sessions: 4C-ORAL: SEMANTICS: SENTENCE-LEVEL SEMANTICS AND TEXTUAL INFERENCE Paper link in the virtual conferenceFull Text Code BibTeX DetailsDetails
Pre-trained language models have achieved huge success on a wide range of NLP tasks. However, contextual representations from pre-trained models contain entangled semantic and syntactic information, and therefore cannot be directly used to derive useful semantic sentence embeddings for some tasks. Paraphrase pairs offer an effective way of learning the distinction between semantics and syntax, as they naturally share semantics and often vary in syntax. In this work, we present ParaBART, a semantic sentence embedding model that learns to disentangle semantics and syntax in sentence embeddings obtained by pre-trained language models. ParaBART is trained to perform syntax-guided paraphrasing, based on a source sentence that shares semantics with the target paraphrase, and a parse tree that specifies the target syntax. In this way, ParaBART learns disentangled semantic and syntactic representations from their respective inputs with separate encoders. Experiments in English show that ParaBART outperforms state-of-the-art sentence embedding models on unsupervised semantic similarity tasks. Additionally, we show that our approach can effectively remove syntactic information from semantic sentence embeddings, leading to better robustness against syntactic variation on downstream semantic tasks.
@inproceedings{huang2021disentangling, title = {Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models}, author = {Huang, James Y. and Huang, Kuan-Hao and Chang, Kai-Wei}, booktitle = {NAACL (short)}, presentation_id = {https://underline.io/events/122/sessions/4151/lecture/19910-disentangling-semantics-and-syntax-in-sentence-embeddings-with-pre-trained-language-models}, year = {2021} }Check out our #NAACL2021 paper on semantic sentence embeddings! By disentangling the semantics and the syntax of sentences, our ParaBART achieves better performance on semantic textual similarity tasks. (https://t.co/QspSh8W2XJ w/ James Huang and @kaiwei_chang) [1/2] #UCLANLP pic.twitter.com/XzgSmN0353
— Kuan-Hao Huang (@kuanhao_) April 15, 2021Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, and Kai-Wei Chang, in NAACL, 2021.
QA Sessions: 15A-ORAL: LANGUAGE GROUNDING TO VISION, ROBOTICS AND BEYOND Paper link in the virtual conferenceFull Text BibTeX DetailsDetails
Pre-trained contextual vision-and-language (V&L) models have brought impressive performance improvement on various benchmarks. However, the paired text-image data required for pre-training are hard to collect and scale up. We investigate if a strong V&L representation model can be learned without text-image pairs. We propose Weakly-supervised VisualBERT with the key idea of conducting "mask-and-predict" pre-training on language-only and image-only corpora. Additionally, we introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. Evaluation on four V&L benchmarks shows that Weakly-supervised VisualBERT achieves similar performance with a model pre-trained with paired data. Besides, pre-training on more image-only data further improves a model that already has access to aligned data, suggesting the possibility of utilizing billions of raw images available to enhance V&L models.
@inproceedings{li2021unsupervised, author = {Li, Liunian Harold and You, Haoxuan and Wang, Zhecan and Zareian, Alireza and Chang, Shih-Fu and Chang, Kai-Wei}, title = {Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions}, booktitle = {NAACL}, presentation_id = {https://underline.io/events/122/sessions/4269/lecture/19725-unsupervised-vision-and-language-pre-training-without-parallel-images-and-captions}, year = {2021} }Excited to share our NAACL paper Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions! https://t.co/R248NcGH3b
— Liunian Harold Li (@LiLiunian) April 16, 2021
We show that one could pre-train a V&L model on unaligned images and text with competitive performance as models trained on aligned data. pic.twitter.com/7TrKAMxL6aRelated Publications
- Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning, EMNLP, 2021
- What Does BERT with Vision Look At?, ACL (short), 2020
- VisualBERT: A Simple and Performant Baseline for Vision and Language, Arxiv, 2019