ICCV25
A tutorial on strengthening vision-language models for mathematical reasoning over geometry, charts, graphs, and spatial visual contexts.
Mathematical reasoning presents unique challenges when combined with visual information. This talk will explore how vision-language models can be enhanced to solve mathematical problems that require understanding geometric relationships, interpreting charts and graphs, and reasoning about spatial configurations in mathematical contexts.
ACL23
A tutorial on learning NLP models from indirect, incidental, noisy, cross-task, and multimodal supervision when direct labels are scarce.
This tutorial targets researchers and practitioners who are interested in ML technologies for NLP from indirect supervision. In particular, we will present a diverse thread of indirect supervision studies that try to answer the following questions: (i) when and how can we provide supervision for a target task T, if all we have is data that corresponds to a ''related'' task T'? (ii) humans do not use exhaustive supervision; they rely on occasional feedback, and learn from incidental signals from various sources; how can we effectively incorporate such supervision in machine learning? (iii) how can we leverage multi-modal supervision to help NLP? To the end, we will discuss several lines of research that address those challenges, including (i) indirect supervision from T' that handles T with outputs spanning from a moderate size to an open space, (ii) the use of sparsely occurring and incidental signals, such as partial labels, noisy labels, knowledge-based constraints, and cross-domain or cross-task annotations---all having statistical associations with the task, (iii) principled ways to measure and understand why these incidental signals can contribute to our target tasks, and (iv) indirect supervision from vision-language signals. We will conclude the tutorial by outlining directions for further investigation.
EMNLP21
A tutorial on diagnosing and improving NLP robustness under adversarial perturbations, distribution shifts, and real-world deployment failures.
Recent studies show that many NLP systems are sensitive and vulnerable to a small perturbation of inputs and do not generalize well across different datasets. This lack of robustness derails the use of NLP systems in real-world applications. This tutorial aims at bringing awareness of practical concerns about NLP robustness. It targets NLP researchers and practitioners who are interested in building reliable NLP systems. In particular, we will review recent studies on analyzing the weakness of NLP systems when facing adversarial inputs and data with a distribution shift. We will provide the audience with a holistic view of (1) how to use adversarial examples to examine the weakness of NLP models and facilitate debugging; (2) how to enhance the robustness of existing NLP models and defense against adversarial inputs; (3) how the consideration of robustness affects the real-world NLP applications used in our daily lives. We will conclude the tutorial by outlining future research directions in this area.
AAAI 2020
A tutorial on transferable representations for low-resource, cross-domain, multilingual, multi-relational, and multimedia learning problems.
Many AI tasks require cross-domain decision making. For example, many NLP tasks involve predictions across multiple languages, in which different languages can be treated as different domains; in AI-aided biomedical study, the prediction of side effects of drugs is often in parallel to modeling the interactions of proteins and organisms. To support machine learning models to solve such cross-domain tasks, a requisite is to extract the characteristics and relations of data components in different domains, and capture their associations in a unified representation scheme. Towards such a demand, recent advances of representation learning often involve mapping unlabeled data of different domains into shared embedding spaces. In such a way, cross-domain knowledge transfer can be realized by vector collocation or transformations. Such transferable representations have seen successes in a range of AI applications involving crossdomain decision making. However, frontier research in this area faces two key challenges. One is to efficaciously extract features from specific domains with very few learning resources. The other is to precisely align and transfer knowledge with minimal supervision, since the alignment information that connects between different domains can often be insufficient and noisy. In this tutorial, we will comprehensively review recent developments of transferable representation learning methods, with a focus on those for text, multi-relational and multimedia data. Beyond introducing the intra-domain embedding learning approaches, we will discuss various semi-supervised, weakly supervised, multi-view and selfsupervised learning techniques to connect multiple domainspecific embedding representations. We will also compare retrofitting and joint learning processes for both intradomain embedding learning and cross-domain alignment learning. In addition, we will discuss how obtained transferable representations can be utilized to address low-resource and label-less learning tasks. Participants will learn about recent trends and emerging challenges in this topic, representative tools and learning resources to obtain ready-to-use models, and how related models and techniques benefit realworld AI applications.
EMNLP 2019
A tutorial on measuring and mitigating societal bias across NLP systems, from embeddings and coreference to translation and vision-language models.
Recent advances in data-driven machine learning techniques (e.g., deep neural networks) have revolutionized many natural language processing applications. These approaches automatically learn how to make decisions based on the statistics and diagnostic information from large amounts of training data. Despite the remarkable accuracy of machine learning in various applications, learning algorithms run the risk of relying on societal biases encoded in the training data to make predictions. This often occurs even when gender and ethnicity information is not explicitly provided to the system because learning algorithms are able to discover implicit associations between individuals and their demographic information based on other variables such as names, titles, home addresses, etc. Therefore, machine learning algorithms risk potentially encouraging unfair and discriminatory decision making and raise serious privacy concerns. Without properly quantifying and reducing the reliance on such correlations, broad adoption of these models might have the undesirable effect of magnifying harmful stereotypes or implicit biases that rely on sensitive demographic attributes. In this tutorial, we will review the history of bias and fairness studies in machine learning and language processing and present recent community effort in quantifying and mitigating bias in natural language processing models for a wide spectrum of tasks, including word embeddings, co-reference resolution, machine translation, and vision-and-language tasks.
NAACL 2015
A hands-on tutorial on casting structured prediction as sequential decision making through learning-to-search algorithms.
Many problems in natural language processing involve building outputs that are structured. The predominant approach to structured prediction is global models (such as conditional random fields), which have the advantage of clean underlying semantics at the cost of computational burdens and extreme difficulty in implementation. An alternative strategy is the learning to search (L2S) paradigm, in which the structured prediction task is cast as a sequential decision making process. One can then devise training-time algorithms that learn to make near optimal collective decisions. This paradigm has been gaining increasing traction over the past five years: most notably in dependency parsing (e.g., MaltParser, ClearNLP, etc.), but also much more broadly in less sequential tasks like entity/relation classification and even graph prediction problems found in social network analysis and computer vision. This tutorial has precisely one goal: an attendee should leave the tutorial with hands on experience writing small programs to perform structured prediction for a variety of tasks, like sequence labeling, dependency parsing and, time-permitting, more.
AAAI 2016
A tutorial on efficient learning and inference for structured prediction models with complex interdependent outputs.
Many prediction problems required structured decisions. That is, the goal is to assign values to multiple interdependent variables. The relationships between the output variables could represent a sequence, a set of clusters, or in the general case, a graph. When solving these problems, it is important to make consistent decisions that take the interdependencies among output variables into account. Such problems are often referred to as structured prediction problems. In past decades, multiple structured prediction models have been proposed and studied and success has been demonstrated in a range of applications, including natural language processing, information extraction, computer vision and computational biology. However, the high computational cost often limits both models' expressive power and the size of the data that can be handled. Therefore, designing efficient inference and learning algorithms for these models is a key challenge for structured prediction. In this tutorial, we will focus on recent developments in discriminative structured pre- diction models such as Structured SVMs and Structured Perceptron. Beyond introducing the algorithmic approaches in this domain, we will discuss ideas that result in significant improvements both in the learning and in the inference stages of these algorithms. In par- ticular, we will discuss the use of caching techniques to reuse computations and methods for decomposing complex structures, along with learning procedures that make use of it to simplify the learning stage. We will also present a recently proposed formulation that cap- tures similarities between structured labels by using distributed representation. Participants will learn about existing trends in learning and the inference for the structured prediction models, recent tools developed in this area, and how they can be applied to AI applications.
FAT 2018
A hands-on tutorial on detecting, quantifying, and reducing gender stereotypes encoded in word embeddings.
Ensuring fairness in algorithmically-driven decision-making is important to avoid inadvertent cases of bias and perpetuation of harmful stereotypes. However, modern natural language processing techniques, which learn model parameters based on data, might rely on implicit biases presented in the data to make undesirable stereotypical associations. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. Recent results show that even word embeddings trained on Google News articles exhibit female and male gender stereotypes to a disturbing extent. This raises concerns because of their widespread use, as we describe, often tends to amplify these biases. In this tutorial, we will provide attendees hands on experience writing small programs to display and quantify the gender stereotypes in word embedding. We will also show how to reduce such a gender stereotype from the word embedding.
TAAI 2017
A tutorial on practical structured prediction methods for NLP, emphasizing efficient learning, inference, and scalable applications.
Many machine learning problems required structured decisions. That is, the goal is to assign values to multiple interdependent variables. The relationships between the output variables could represent a sequence, a set of clusters, or in the general case, a graph. When solving these problems, it is important to make consistent decisions that take the interdependencies among output variables into account. Such problems are often referred to as structured prediction problems. In past decades, multiple structured prediction models have been proposed and studied and success has been demonstrated in a range of applications, including natural language processing, information extraction, computer vision and computational biology. However, the high computational cost often limits both models' expressive power and the size of the data that can be handled. Therefore, designing efficient inference and learning algorithms for these models is a key challenge for structured prediction. In this tutorial, we will focus on recent developments in discriminative structured prediction models such as Structured SVMs and Structured Perceptron. Beyond introducing the algorithmic approaches in this domain, we will discuss ideas that result in significant improvements both in the learning and in the inference stages of these algorithms. In particular, we will discuss the use of caching techniques to reuse computations and methods for decomposing complex structures, along with learning procedures that make use of it to simplify the learning stage. We will also present a recently proposed formulation that captures similarities between structured labels by using distributed representation. We will also discuss potential risks and challenges when using structured prediction models. Participants will learn about existing trends in learning and the inference for the structured prediction models, recent tools developed in this area, and how they can be applied to several natural language processing tasks.