At UCLA-NLP, our mission is to develop reliable, fair, accountable, robust natural language understanding and generation technology to benefit everyone. See our recent papers here

ACL is one of the major conferences in the field of natural language processing (NLP). We will participate in the following activities this year.

Tutorial

We will present a tutorial on “Indirectly Supervised Natural Language Processing” in 14:00-17:30 on 7/9

Workshop

We will host the 3rd Trustworthy Natual Language Processing Workshop on 7/14.

Accepted Papers

we will present papers on the following topics:

  • Trustworthy NLP
  • [1], [2], [3], [4]
    1. Efficient Shapley Values Estimation by Amortization for Text Classification, Chenghao Yang, Fan Yin, He He, Kai-Wei Chang, Xiaofei Ma, and Bing Xiang, in ACL, 2023. Details
    2. Resolving Ambiguities in Text-to-Image Generative Models, Ninareh Mehrabi, Palash Goyal, Apurv Verma, Jwala Dhamala, Varun Kumar, Qian Hu, Kai-Wei Chang, Richard Zemel, Aram Galstyan, and Rahul Gupta, in ACL, 2023. Details
    3. The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks, Nikil Roashan Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot, and Kai-Wei Chang, in ACL (short), 2023. Details
    4. PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English, Jianfeng Chi, Wasi Uddin Ahmad, Yuan Tian, and Kai-Wei Chang, in ACL (short), 2023. Details
  • Vision-Language and Multimodal Model
  • [1], [2], [3]
    1. MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models, Masoud Monajatipoor, Liunian Harold Li, Mozhdeh Rouhsedaghat, Lin Yang, and Kai-Wei Chang, in ACL (short), 2023. Details
    2. UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding, Rui Sun, Zhecan Wang, Haoxuan You, Noel Codella, Kai-Wei Chang, and Shih-Fu Chang, in ACL-Finding, 2023. Details
    3. AVATAR: A Parallel Corpus for Java-Python Program Translation, Wasi Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang, in ACL-Finding (short), 2023. Details
  • Language and Reasoning
  • [1], [2]
    1. Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step, Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, and Yejin Choi, in CL, 2023. Details
    2. A Survey of Deep Learning for Mathematical Reasoning, Pan Lu, Liang Qiu, Wenhao Yu, Sean Welleck, and Kai-Wei Chang, in ACL, 2023. Details
  • Semantic and Syantatic Analysis
  • [1], [2], [3], [4], [5]
    1. ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR Back-Translation, Kuan-Hao Huang, Varun Iyer, I.-Hung Hsu, Anoop Kumar, Kai-Wei Chang, and Aram Galstyan, in ACL, 2023. Details
    2. TAGPRIME: A Unified Framework for Relational Structure Extraction, I.-Hung Hsu, Kuan-Hao Huang, Shuning Zhang, Wenxin Cheng, Prem Natarajan, Kai-Wei Chang, and Nanyun Peng, in ACL, 2023. Details
    3. GENEVA: Pushing the Limit of Generalizability for Event Argument Extraction with 100+ Event Types, Tanmay Parekh, I.-Hung Hsu, Kuan-Hao Huang, Kai-Wei Chang, and Nanyun Peng, in ACL, 2023. Details
    4. Enhancing Unsupervised Semantic Parsing with Distributed Contextual Representations, Zixuan Ling, Xiaoqing Zheng, Jianhan Xu, Jinshu Lin, Kai-Wei Chang, Cho-Jui Hsieh, and Xuanjing Huang, in ACL-Finding, 2023. Details
    5. PIP: Parse-Instructed Prefix for Syntactically Controlled Paraphrase Generation, Yixin Wan, Kuan-Hao Huang, and Kai-Wei Chang, in ACL-Finding (short), 2023. Details

    Trustworthy NLP: Fairness, Robustness, Model Explaination, and Social Good Application

    [1], [2], [3], [4]
    1. Efficient Shapley Values Estimation by Amortization for Text Classification

      Chenghao Yang, Fan Yin, He He, Kai-Wei Chang, Xiaofei Ma, and Bing Xiang, in ACL, 2023.
      QA Sessions: Interpretability and Analysis of Models for NLP 2: 7/11 5:45PM Paper link in the virtual conference
      Full Text BibTeX Details
      Despite the popularity of Shapley Values in explaining neural text classification models, computing them is prohibitive for large pretrained models due to a large number of model evaluations as it needs to perform multiple model evaluations over various perturbed text inputs. In practice, Shapley Values are often estimated stochastically with a smaller number of model evaluations. However, we find that the estimated Shapley Values are quite sensitive to random seeds¡Xthe top-ranked features often have little overlap under two different seeds, especially on examples with the longer input text. As a result, a much larger number of model evaluations is needed to reduce the sensitivity to an acceptable level. To mitigate the trade-off between stability and efficiency, we develop an amortized model that directly predicts Shapley Values of each input feature without additional model evaluation. It is trained on a set of examples with Shapley Values estimated from a large number of model evaluations to ensure stability. Experimental results on two text classification datasets demonstrate that, the proposed amortized model can estimate black-box explanation scores in milliseconds per sample in inference time and is up to 60 times more efficient than traditional methods.
      @inproceedings{yang2023efficient,
        title = {Efficient Shapley Values Estimation by Amortization for Text Classification},
        author = {Yang, Chenghao and Yin, Fan and He, He and Chang, Kai-Wei and Ma, Xiaofei and Xiang, Bing},
        year = {2023},
        presentation_id = {https://underline.io/events/395/sessions/15249/lecture/76179-efficient-shapley-values-estimation-by-amortization-for-text-classification},
        booktitle = {ACL}
      }
      
      Details
    2. Resolving Ambiguities in Text-to-Image Generative Models

      Ninareh Mehrabi, Palash Goyal, Apurv Verma, Jwala Dhamala, Varun Kumar, Qian Hu, Kai-Wei Chang, Richard Zemel, Aram Galstyan, and Rahul Gupta, in ACL, 2023.
      QA Sessions: POSTER SESSION 4:July 11 11:00 AM - July 11 12:30 AM Paper link in the virtual conference
      Full Text BibTeX Details
      Natural language often contains ambiguities that can lead to misinterpretation and miscommunication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and common-sense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate a benchmark dataset covering different types of ambiguities that occur in these systems. We then propose a framework to mitigate ambiguities in the prompts given to the systems by soliciting clarifications from the user. Through automatic and human evaluations, we show the effectiveness of our framework in generating more faithful images aligned with human intention in the presence of ambiguities.
      @inproceedings{mehrabi2023resolving,
        author = {Mehrabi, Ninareh and Goyal, Palash and Verma, Apurv and Dhamala, Jwala and Kumar, Varun and Hu, Qian and Chang, Kai-Wei and Zemel, Richard and Galstyan, Aram and Gupta, Rahul},
        booktitle = {ACL},
        title = {Resolving Ambiguities in Text-to-Image Generative Models},
        presentation_id = {https://underline.io/events/395/posters/15237/poster/76575-resolving-ambiguities-in-text-to-image-generative-models},
        year = {2023}
      }
      
      Details
    3. The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks

      Nikil Roashan Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot, and Kai-Wei Chang, in ACL (short), 2023.
      QA Sessions: POSTER SESSION 2: July 10 14:00 AM - July 10 15:30 PM Paper link in the virtual conference
      Full Text BibTeX Details Outstanding Paper Award
      How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model? In this work, we study this question by contrasting social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. To do so, we empirically simulate various alternative constructions for a given benchmark based on innocuous modifications (such as paraphrasing or random-sampling) that maintain the essence of their social bias. On two well-known social bias benchmarks (Winogender and BiasNLI) we observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models. We hope these troubling observations motivate more robust measures of social biases.
      @inproceedings{roashan2023tail,
        author = {Selvam, Nikil Roashan and Dev, Sunipa and Khashabi, Daniel and Khot, Tushar and Chang, Kai-Wei},
        title = {The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks},
        presentation_id = {https://underline.io/events/395/posters/15337/poster/76963-the-tail-wagging-the-dog-dataset-construction-biases-of-social-bias-benchmarks},
        booktitle = {ACL (short)},
        year = {2023}
      }
      
      Details
    4. PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English

      Jianfeng Chi, Wasi Uddin Ahmad, Yuan Tian, and Kai-Wei Chang, in ACL (short), 2023.
      QA Sessions: VIRTUAL POSTER SESSION 3: July 12 11:00 AM - July 12 09:30 AM Paper link in the virtual conference
      Full Text BibTeX Details
      Privacy policies provide individuals with information about their rights and how their personal information is handled. Natural language understanding (NLU) technologies can support individuals and practitioners to understand better privacy practices described in lengthy and complex documents. However, existing efforts that use NLU technologies are limited by processing the language in a way exclusive to a single task focusing on certain privacy practices. To this end, we introduce the Privacy Policy Language Understanding Evaluation (PLUE) benchmark, a multi-task benchmark for evaluating the privacy policy language understanding across various tasks. We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training. We evaluate several generic pre-trained language models and continue pre-training them on the collected corpus. We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
      @inproceedings{chi2023plue,
        author = {Chi, Jianfeng and Ahmad, Wasi Uddin and Tian, Yuan and Chang, Kai-Wei},
        title = {PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English},
        presentation_id = {https://underline.io/events/395/posters/15279/poster/76751-plue-language-understanding-evaluation-benchmark-for-privacy-policies-in-english},
        booktitle = {ACL (short)},
        year = {2023}
      }
      
      Details

    Multi-modal Model: Vision+Language, Program+Language

    [1], [2], [3]
    1. MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

      Masoud Monajatipoor, Liunian Harold Li, Mozhdeh Rouhsedaghat, Lin Yang, and Kai-Wei Chang, in ACL (short), 2023.
      QA Sessions: POSTER SESSION 2: July 10 14:00 AM - July 10 15:30 PM Paper link in the virtual conference
      Full Text BibTeX Details
      Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to VL domain? Specifically, we first meta-trains a language model to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having 20 times fewer parameters.
      @inproceedings{monajatipoor2023metavl,
        author = {Monajatipoor, Masoud and Li, Liunian Harold and Rouhsedaghat, Mozhdeh and Yang, Lin and Chang, Kai-Wei},
        title = {MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models},
        booktitle = {ACL (short)},
        presentation_id = {https://underline.io/events/395/posters/15337/poster/76709-metavl-transferring-in-context-learning-ability-from-language-models-to-vision-language-models},
        year = {2023}
      }
      
      Details
    2. UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

      Rui Sun, Zhecan Wang, Haoxuan You, Noel Codella, Kai-Wei Chang, and Shih-Fu Chang, in ACL-Finding, 2023.
      QA Sessions: VIRTUAL POSTER SESSION 3: July 12 11:00 AM - July 12 12:30 AM Paper link in the virtual conference
      Full Text BibTeX Details
      Vision-language tasks, such as VQA, SNLI-VE, and VCR are challenging because they require the model’s reasoning ability to understand the semantics of the visual world and natural language. Supervised methods working for vision-language tasks have been well-studied. However, solving these tasks in a zero-shot setting is less explored. Since Contrastive Language-Image Pre-training (CLIP) has shown remarkable zero-shot performance on image-text matching, previous works utilized its strong zero-shot ability by converting vision-language tasks into an image-text matching problem, and they mainly consider global-level matching (e.g., the whole image or sentence). However, we find visual and textual fine-grained information, e.g., keywords in the sentence and objects in the image, can be fairly informative for semantics understanding. Inspired by this, we propose a unified framework to take advantage of the fine-grained information for zero-shot vision-language learning, covering multiple tasks such as VQA, SNLI-VE, and VCR. Our experiments show that our framework outperforms former zero-shot methods on VQA and achieves substantial improvement on SNLI-VE and VCR. Furthermore, our ablation studies confirm the effectiveness and generalizability of our proposed method.
      @inproceedings{sun2023unifine,
        author = {Sun, Rui and Wang, Zhecan and You, Haoxuan and Codella, Noel and Chang, Kai-Wei and Chang, Shih-Fu},
        title = {UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding},
        booktitle = {ACL-Finding},
        year = {2023},
        presentation_id = {https://underline.io/events/395/posters/15279/poster/78004-unifine-a-unified-and-fine-grained-approach-for-zero-shot-vision-language-understanding}
      }
      
      Details
    3. AVATAR: A Parallel Corpus for Java-Python Program Translation

      Wasi Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang, in ACL-Finding (short), 2023.
      Full Text Code BibTeX Details
      Program translation refers to migrating source code from one programming language to another. It has a tremendous practical value in software development as porting software across different languages is time-consuming and costly. Automating program translation is of paramount importance in software migration, and recently researchers explored unsupervised approaches due to the unavailability of parallel corpora. However, the availability of pre-trained language models for programming languages enable supervised fine-tuning with a small amount of labeled examples. In this work, we present a corpus of 8,475 programming problems and their solutions written in two popular languages, Java and Python. We collect the dataset from competitive programming sites, online platforms, and open source repositories. We present several baselines, including models trained from scratch or pre-trained on large-scale source code collection and fine-tuned on our proposed dataset. Experiment results show that while the models perform relatively well in terms of the lexical match, they lack in generating code that is accurate in terms of syntax and data-flow match.
      @inproceedings{ahmad2021avatar,
        title = {AVATAR: A Parallel Corpus for Java-Python Program Translation},
        author = {Ahmad, Wasi and Tushar, Md Golam Rahman and Chakraborty, Saikat and Chang, Kai-Wei},
        booktitle = {ACL-Finding (short)},
        year = {2023}
      }
      

      Related Publications

      1. DesCo: Learning Object Recognition with Rich Language Descriptions

        Liunian Harold Li, Zi-Yi Dou, Nanyun Peng, and Kai-Wei Chang, in Arxiv, 2023.
        Full Text Abstract BibTeX Details Ranks 1st at the #OmniLabel Challenge of CVPR2023
        Recent development in vision-language approaches has instigated a paradigm shift in learning visual recognition models from language supervision. These approaches align objects with language queries (e.g. "a photo of a cat") and improve the models’ adaptability to identify novel objects and domains. Recently, several studies have attempted to query these models with complex language expressions that include specifications of fine-grained semantic details, such as attributes, shapes, textures, and relations. However, simply incorporating language descriptions as queries does not guarantee accurate interpretation by the models. In fact, our experiments show that GLIP, the state-of-the-art vision-language model for object detection, often disregards contextual information in the language descriptions and instead relies heavily on detecting objects solely by their names. To tackle the challenges, we propose a new description-conditioned (DesCo) paradigm of learning object recognition models with rich language descriptions consisting of two major innovations: 1) we employ a large language model as a commonsense knowledge engine to generate rich language descriptions of objects based on object names and the raw image-text caption; 2) we design context-sensitive queries to improve the model’s ability in deciphering intricate nuances embedded within descriptions and enforce the model to focus on context rather than object names alone. On two novel object detection benchmarks, LVIS and OminiLabel, under the zero-shot detection setting, our approach achieves 34.8 APr minival (+9.1) and 29.3 AP (+3.6), respectively, surpassing the prior state-of-the-art models, GLIP and FIBER, by a large margin.
        @inproceedings{li2023desco,
          author = {Li, Liunian Harold and Dou, Zi-Yi and Peng, Nanyun and Chang, Kai-Wei},
          title = {DesCo: Learning Object Recognition with Rich Language Descriptions},
          booktitle = {Arxiv},
          year = {2023}
        }
        
        Details
      2. AVATAR: A Parallel Corpus for Java-Python Program Translation

        Wasi Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang, in ACL-Finding (short), 2023.
        Full Text Code Abstract BibTeX Details
        Program translation refers to migrating source code from one programming language to another. It has a tremendous practical value in software development as porting software across different languages is time-consuming and costly. Automating program translation is of paramount importance in software migration, and recently researchers explored unsupervised approaches due to the unavailability of parallel corpora. However, the availability of pre-trained language models for programming languages enable supervised fine-tuning with a small amount of labeled examples. In this work, we present a corpus of 8,475 programming problems and their solutions written in two popular languages, Java and Python. We collect the dataset from competitive programming sites, online platforms, and open source repositories. We present several baselines, including models trained from scratch or pre-trained on large-scale source code collection and fine-tuned on our proposed dataset. Experiment results show that while the models perform relatively well in terms of the lexical match, they lack in generating code that is accurate in terms of syntax and data-flow match.
        @inproceedings{ahmad2021avatar,
          title = {AVATAR: A Parallel Corpus for Java-Python Program Translation},
          author = {Ahmad, Wasi and Tushar, Md Golam Rahman and Chakraborty, Saikat and Chang, Kai-Wei},
          booktitle = {ACL-Finding (short)},
          year = {2023}
        }
        
        Details
      3. Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

        Da Yin, Liunian Harold Li, Ziniu Hu, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2021.
        Full Text Code Abstract BibTeX Details
        Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models’ ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard multimodal commonsense benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition.
        @inproceedings{yin2021broaden,
          title = {	Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning},
          author = {Yin, Da and Li, Liunian Harold and Hu, Ziniu and Peng, Nanyun and Chang, Kai-Wei},
          booktitle = {EMNLP},
          presentation_id = {https://underline.io/events/192/sessions/7790/lecture/37514-broaden-the-vision-geo-diverse-visual-commonsense-reasoning},
          year = {2021}
        }
        
        Details
      4. Retrieval Augmented Code Generation and Summarization

        Md Rizwan Parvez, Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in EMNLP-Finding, 2021.
        Full Text Abstract BibTeX Details
        Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers’ code or summary generation behavior, we propose a retrieval augmented framework, \tool, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. \tool has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.
        @inproceedings{parvez2021retrieval,
          title = {Retrieval Augmented Code Generation and Summarization},
          author = {Parvez, Md Rizwan and Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
          booktitle = {EMNLP-Finding},
          presentation_id = {https://underline.io/events/192/sessions/7923/lecture/38314-retrieval-augmented-code-generation-and-summarization},
          year = {2021}
        }
        
        Details
      5. Unified Pre-training for Program Understanding and Generation

        Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in NAACL, 2021.
        Full Text Video Code Abstract BibTeX Details Top-10 cited paper at NAACL 21
        Code summarization nd generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART’s effectiveness in program understanding. Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow (e.g., if block inside an else block is equivalent to else if block) that are crucial to program semantics and thus excels even with limited annotations.
        @inproceedings{ahmad2021unified,
          title = {Unified Pre-training for Program Understanding and Generation},
          author = {Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
          booktitle = {NAACL},
          presentation_id = {https://underline.io/events/122/sessions/4197/lecture/20024-unified-pre-training-for-program-understanding-and-generation},
          year = {2021}
        }
        
        Details
      6. Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions

        Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, and Kai-Wei Chang, in NAACL, 2021.
        Full Text Video Abstract BibTeX Details
        Pre-trained contextual vision-and-language (V&L) models have brought impressive performance improvement on various benchmarks. However, the paired text-image data required for pre-training are hard to collect and scale up. We investigate if a strong V&L representation model can be learned without text-image pairs. We propose Weakly-supervised VisualBERT with the key idea of conducting "mask-and-predict" pre-training on language-only and image-only corpora. Additionally, we introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. Evaluation on four V&L benchmarks shows that Weakly-supervised VisualBERT achieves similar performance with a model pre-trained with paired data. Besides, pre-training on more image-only data further improves a model that already has access to aligned data, suggesting the possibility of utilizing billions of raw images available to enhance V&L models.
        @inproceedings{li2021unsupervised,
          author = {Li, Liunian Harold and You, Haoxuan and Wang, Zhecan and Zareian, Alireza and Chang, Shih-Fu and Chang, Kai-Wei},
          title = {Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions},
          booktitle = {NAACL},
          presentation_id = {https://underline.io/events/122/sessions/4269/lecture/19725-unsupervised-vision-and-language-pre-training-without-parallel-images-and-captions},
          year = {2021}
        }
        
        Details
      7. What Does BERT with Vision Look At?

        Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in ACL (short), 2020.
        Full Text Slides Video Code Abstract BibTeX Details
        Pre-trained visually grounded language models such as ViLBERT, LXMERT, and UNITER have achieved significant performance improvement on vision-and-language tasks but what they learn during pre-training remains unclear. In this work, we demonstrate that certain attention heads of a visually grounded language model actively ground elements of language to image regions. Specifically, some heads can map entities to image regions, performing the task known as entity grounding. Some heads can even detect the syntactic relations between non-entity words and image regions, tracking, for example, associations between verbs and regions corresponding to their arguments. We denote this ability as \emphsyntactic grounding. We verify grounding both quantitatively and qualitatively, using Flickr30K Entities as a testbed.
        @inproceedings{li2020what,
          author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
          title = {What Does BERT with Vision Look At?},
          booktitle = {ACL (short)},
          presentation_id = {https://virtual.acl2020.org/paper_main.469.html},
          year = {2020}
        }
        
        Details
      8. VisualBERT: A Simple and Performant Baseline for Vision and Language

        Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in Arxiv, 2019.
        Full Text Code Abstract BibTeX Details
        We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.
        @inproceedings{li2019visualbert,
          author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
          title = {VisualBERT: A Simple and Performant Baseline for Vision and Language},
          booktitle = {Arxiv},
          year = {2019}
        }
        
        Details

      Details

    Reasoning and Langauge

    [1], [2]
    1. Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step

      Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, and Yejin Choi, in CL, 2023.
      QA Sessions: POSTER SESSION 1, 7/10 11:00AM-12:30AM Paper link in the virtual conference
      Full Text BibTeX Details
      Chain-of-thought prompting (e.g., "Let’s think step-by-step") primes large language models to verbalize rationalization for their predictions. While chain-of-thought can lead to dramatic performance gains, benefits appear to emerge only for sufficiently large models (beyond 50B parameters). We show that orders-of-magnitude smaller models (125M – 1.3B parameters) can still benefit from chain-of-thought prompting. To achieve this, we introduce Symbolic Chain-of-Thought Distillation (SCoTD), a method to train a smaller student model on rationalizations sampled from a significantly larger teacher model. Experiments across several commonsense benchmarks show that: 1) SCoTD enhances the performance of the student model in both supervised and few-shot settings, and especially for challenge sets; 2) sampling many reasoning chains per instance from the teacher is paramount; and 3) after distillation, student chain-of-thoughts are judged by humans as comparable to the teacher, despite orders of magnitude fewer parameters. We test several hypotheses regarding what properties of chain-of-thought samples are important, e.g., diversity vs. teacher likelihood vs. open-endedness. We release our corpus of chain-of-thought samples and code.
      @inproceedings{li2023symbolic,
        title = {Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step},
        author = {Li, Liunian Harold and Hessel, Jack and Yu, Youngjae and Ren, Xiang and Chang, Kai-Wei and Choi, Yejin},
        booktitle = {CL},
        presentation_id = {https://underline.io/events/395/posters/15197/poster/77090-symbolic-chain-of-thought-distillation-small-models-can-also-think-step-by-step?tab=poster},
        year = {2023}
      }
      
      Details
    2. A Survey of Deep Learning for Mathematical Reasoning

      Pan Lu, Liang Qiu, Wenhao Yu, Sean Welleck, and Kai-Wei Chang, in ACL, 2023.
      QA Sessions: POSTER SESSION 2: July 10 14:00 AM - July 10 15:30 PM Paper link in the virtual conference
      Full Text BibTeX Details
      Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
      @inproceedings{lu2023survey,
        author = {Lu, Pan and Qiu, Liang and Yu, Wenhao and Welleck, Sean and Chang, Kai-Wei},
        title = {A Survey of Deep Learning for Mathematical Reasoning},
        booktitle = {ACL},
        year = {2023},
        presentation_id = {https://underline.io/events/395/posters/15337/poster/76360-a-survey-of-deep-learning-for-mathematical-reasoning}
      }
      
      Details

    Semantic and Syntactic Analysis

    [1], [2], [3], [4], [5]
    1. ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR Back-Translation

      Kuan-Hao Huang, Varun Iyer, I.-Hung Hsu, Anoop Kumar, Kai-Wei Chang, and Aram Galstyan, in ACL, 2023.
      QA Sessions: POSTER SESSION 3: July 11 09:00 AM - July 11 10:30 AM Paper link in the virtual conference
      Full Text BibTeX Details Area Chair’s Award
      Paraphrase generation is a long-standing task in natural language processing (NLP). Supervised paraphrase generation models, which rely on human-annotated paraphrase pairs, are cost-inefficient and hard to scale up. On the other hand, automatically annotated paraphrase pairs (e.g., by machine back-translation), usually suffer from the lack of syntactic diversity – the generated paraphrase sentences are very similar to the source sentences in terms of syntax. In this work, we present ParaAMR, a large-scale syntactically diverse paraphrase dataset created by abstract meaning representation back-translation. Our quantitative analysis, qualitative examples, and human evaluation demonstrate that the paraphrases of ParaAMR are syntactically more diverse compared to existing large-scale paraphrase datasets while preserving good semantic similarity. In addition, we show that ParaAMR can be used to improve on three NLP tasks: learning sentence embeddings, syntactically controlled paraphrase generation, and data augmentation for few-shot learning. Our results thus showcase the potential of ParaAMR for improving various NLP applications.
      @inproceedings{huang2023paraarm,
        author = {Huang, Kuan-Hao and Iyer, Varun and Hsu, I-Hung and Kumar, Anoop and Chang, Kai-Wei and Galstyan, Aram},
        title = {ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR Back-Translation},
        booktitle = {ACL},
        presentation_id = {https://underline.io/events/395/posters/15227/poster/76600-paraamr-a-large-scale-syntactically-diverse-paraphrase-dataset-by-amr-back-translation},
        year = {2023}
      }
      
      Details
    2. TAGPRIME: A Unified Framework for Relational Structure Extraction

      I.-Hung Hsu, Kuan-Hao Huang, Shuning Zhang, Wenxin Cheng, Prem Natarajan, Kai-Wei Chang, and Nanyun Peng, in ACL, 2023.
      QA Sessions: INFORMATION EXTRACTION 1, July 11 16:15 PM Paper link in the virtual conference
      Full Text BibTeX Details
      Many tasks in natural language processing require the extraction of relationship information for a given condition, such as event argument extraction, relation extraction, and task-oriented semantic parsing. Recent works usually propose sophisticated models for each task independently and pay less attention to the commonality of these tasks and to have a unified framework for all the tasks. In this work, we propose to take a unified view of all these tasks and introduce TAGPRIME to address relational structure extraction problems. TAGPRIME is a sequence tagging model that appends priming words about the information of the given condition (such as an event trigger) to the input text. With the self-attention mechanism in pre-trained language models, the priming words make the output contextualized representations contain more information about the given condition, and hence become more suitable for extracting specific relationships for the condition. Extensive experiments and analyses on three different tasks that cover ten datasets across five different languages demonstrate the generality and effectiveness of TAGPRIME.
      @inproceedings{hsu2023tagprime,
        author = {Hsu, I-Hung and Huang, Kuan-Hao and Zhang, Shuning and Cheng, Wenxin and Natarajan, Prem and Chang, Kai-Wei and Peng, Nanyun},
        title = {TAGPRIME: A Unified Framework for Relational Structure Extraction},
        booktitle = {ACL},
        presentation_id = {https://underline.io/events/395/sessions/15250/lecture/76330-tagprime-a-unified-framework-for-relational-structure-extraction},
        year = {2023}
      }
      
      Details
    3. GENEVA: Pushing the Limit of Generalizability for Event Argument Extraction with 100+ Event Types

      Tanmay Parekh, I.-Hung Hsu, Kuan-Hao Huang, Kai-Wei Chang, and Nanyun Peng, in ACL, 2023.
      QA Sessions: POSTER SESSION 6, 7/12 9:00AM-10:30AM Paper link in the virtual conference
      Full Text Code BibTeX Details
      Recent works in Event Argument Extraction (EAE) have focused on improving model generalizability to cater to new events and domains. However, standard benchmarking datasets like ACE and ERE cover less than 40 event types and 25 entity-centric argument roles. Limited diversity and coverage hinder these datasets from adequately evaluating the generalizability of EAE models. In this paper, we first contribute by creating a large and diverse EAE ontology. This ontology is created by transforming FrameNet, a comprehensive semantic role labeling (SRL) dataset for EAE, by exploiting the similarity between these two tasks. Then, exhaustive human expert annotations are collected to build the ontology, concluding with 115 events and 220 argument roles, with a significant portion of roles not being entities. We utilize this ontology to further introduce GENEVA, a diverse generalizability benchmarking dataset comprising four test suites, aimed at evaluating models’ ability to handle limited data and unseen event type generalization. We benchmark six EAE models from various families. The results show that owing to non-entity argument roles, even the best-performing model can only achieve 39% F1 score, indicating how GENEVA provides new challenges for generalization in EAE. Overall, our large and diverse EAE ontology can aid in creating more comprehensive future resources, while GENEVA is a challenging benchmarking dataset encouraging further research for improving generalizability in EAE.
      @inproceedings{parekh2023geneva,
        title = {GENEVA: Pushing the Limit of Generalizability for Event Argument Extraction with 100+ Event Types},
        author = {Parekh, Tanmay and Hsu, I-Hung and Huang, Kuan-Hao and Chang, Kai-Wei and Peng, Nanyun},
        booktitle = {ACL},
        presentation_id = {https://underline.io/events/395/posters/15264/poster/77026-geneva-benchmarking-generalizability-for-event-argument-extraction-with-hundreds-of-event-types-and-argument-roles},
        year = {2023}
      }
      
      Details
    4. Enhancing Unsupervised Semantic Parsing with Distributed Contextual Representations

      Zixuan Ling, Xiaoqing Zheng, Jianhan Xu, Jinshu Lin, Kai-Wei Chang, Cho-Jui Hsieh, and Xuanjing Huang, in ACL-Finding, 2023.
      QA Sessions: VIRTUAL POSTER SESSION 3: July 12 11:00 AM - July 12 12:30 AM Paper link in the virtual conference
      BibTeX Details
      We extend a non-parametric Bayesian model of (Titov and Klementiev, 2011) to deal with homonymy and polysemy by leveraging distributed contextual word and phrase representations pre-trained on a large collection of unlabelled texts. Then, unsupervised semantic parsing is performed by decomposing sentences into fragments, clustering the fragments to abstract away syntactic variations of the same meaning, and predicting predicate-argument relations between the fragments. To better model the statistical dependencies between predicates and their arguments, we further conduct a hierarchical Pitman-Yor process. An improved Metropolis-Hastings merge-split sampler is proposed to speed up the mixing and convergence of Markov chains by leveraging pre-trained distributed representations. The experimental results show that the models achieve better accuracy on both question-answering and relation extraction tasks.
      @inproceedings{ling2023enhancing,
        author = {Ling, Zixuan and Zheng, Xiaoqing and Xu, Jianhan and Lin, Jinshu and Chang, Kai-Wei and Hsieh, Cho-Jui and Huang, Xuanjing},
        title = {Enhancing Unsupervised Semantic Parsing with Distributed Contextual Representations},
        booktitle = {ACL-Finding},
        presentation_id = {https://underline.io/events/395/posters/15279/poster/77281-enhancing-unsupervised-semantic-parsing-with-distributed-contextual-representations?tab=video},
        year = {2023}
      }
      
      Details
    5. PIP: Parse-Instructed Prefix for Syntactically Controlled Paraphrase Generation

      Yixin Wan, Kuan-Hao Huang, and Kai-Wei Chang, in ACL-Finding (short), 2023.
      QA Sessions: VIRTUAL POSTER SESSION 3: July 12 11:00 AM - July 12 12:30 AM Paper link in the virtual conference
      Full Text BibTeX Details
      Syntactically controlled paraphrase generation requires language models to generate paraphrases for sentences according to specific syntactic structures. Existing fine-tuning methods for this task are costly as all the parameters of the model need to be updated during the training process. Inspired by recent studies on parameter-efficient learning, we propose Parse-Instructed Prefix (PIP), a novel adaptation of prefix-tuning to tune large pre-trained language models on syntactically controlled paraphrase generation task in a low-data setting with significantly less training cost. We introduce two methods to instruct a model’s encoder prefix to capture syntax-related knowledge: direct initiation (PIP-Direct) and indirect optimization (PIP-Indirect). In contrast to traditional fine-tuning methods for this task, PIP is a compute-efficient alternative with 10 times less learnable parameters. Compared to existing prefix-tuning methods, PIP excels at capturing syntax control information, achieving significantly higher performance at the same level of learnable parameter count.
      @inproceedings{wan2023pip,
        author = {Wan, Yixin and Huang, Kuan-Hao and Chang, Kai-Wei},
        title = {PIP: Parse-Instructed Prefix for Syntactically Controlled Paraphrase Generation},
        booktitle = {ACL-Finding (short)},
        presentation_id = {https://underline.io/events/395/posters/15279/poster/77944-pip-parse-instructed-prefix-for-syntactically-controlled-paraphrase-generation},
        year = {2023}
      }
      
      Details