Share this page:

Few-Shot Representation Learning for Out-Of-Vocabulary Words

Ziniu Hu, Ting Chen, Kai-Wei Chang, and Yizhou Sun, in ACL, 2019.

Poster Code

Download the full text


Abstract

Existing approaches for learning word embeddings often assume there are sufficient occurrences for each word in the corpus, such that the representation of words can be accurately estimated from their contexts. However, in real-world scenarios, out-of-vocabulary (a.k.a. OOV) words that do not appear in training corpus emerge frequently. It is challenging to learn accurate representations of these words with only a few observations. In this paper, we formulate the learning of OOV embeddings as a few-shot regression problem, and address it by training a representation function to predict the oracle embedding vector (defined as embedding trained with abundant observations) based on limited observations. Specifically, we propose a novel hierarchical attention-based architecture to serve as the neural regression function, with which the context information of a word is encoded and aggregated from K observations. Furthermore, our approach can leverage Model-Agnostic Meta-Learning (MAML) for adapting the learned model to the new corpus fast and robustly. Experiments show that the proposed approach significantly outperforms existing methods in constructing accurate embeddings for OOV words, and improves downstream tasks where these embeddings are utilized.


Bib Entry

@inproceedings{hu2019fewshot,
  author = {Hu, Ziniu and Chen, Ting and Chang, Kai-Wei and Sun, Yizhou},
  title = {Few-Shot Representation Learning for Out-Of-Vocabulary Words},
  booktitle = {ACL},
  year = {2019}
}

Related Publications

  • What Does BERT with Vision Look At?

    Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in ACL (short), 2020.
    Full Text Slides Video Code Abstract BibTeX Details
    Pre-trained visually grounded language models such as ViLBERT, LXMERT, and UNITER have achieved significant performance improvement on vision-and-language tasks but what they learn during pre-training remains unclear. In this work, we demonstrate that certain attention heads of a visually grounded language model actively ground elements of language to image regions. Specifically, some heads can map entities to image regions, performing the task known as entity grounding. Some heads can even detect the syntactic relations between non-entity words and image regions, tracking, for example, associations between verbs and regions corresponding to their arguments. We denote this ability as \emphsyntactic grounding. We verify grounding both quantitatively and qualitatively, using Flickr30K Entities as a testbed.
    @inproceedings{li2020what,
      author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
      title = {What Does BERT with Vision Look At?},
      booktitle = {ACL (short)},
      presentation_id = {469},
      year = {2020}
    }
    
    Details
  • VisualBERT: A Simple and Performant Baseline for Vision and Language

    Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang, in Arxiv, 2019.
    Full Text Code Abstract BibTeX Details
    We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.
    @inproceedings{li2019visualbert,
      author = {Li, Liunian Harold and Yatskar, Mark and Yin, Da and Hsieh, Cho-Jui and Chang, Kai-Wei},
      title = {VisualBERT: A Simple and Performant Baseline for Vision and Language},
      booktitle = {Arxiv},
      year = {2019}
    }
    
    Details
  • Few-Shot Representation Learning for Out-Of-Vocabulary Words

    Ziniu Hu, Ting Chen, Kai-Wei Chang, and Yizhou Sun, in ACL, 2019.
    Full Text Poster Code Abstract BibTeX Details
    Existing approaches for learning word embeddings often assume there are sufficient occurrences for each word in the corpus, such that the representation of words can be accurately estimated from their contexts. However, in real-world scenarios, out-of-vocabulary (a.k.a. OOV) words that do not appear in training corpus emerge frequently. It is challenging to learn accurate representations of these words with only a few observations. In this paper, we formulate the learning of OOV embeddings as a few-shot regression problem, and address it by training a representation function to predict the oracle embedding vector (defined as embedding trained with abundant observations) based on limited observations. Specifically, we propose a novel hierarchical attention-based architecture to serve as the neural regression function, with which the context information of a word is encoded and aggregated from K observations. Furthermore, our approach can leverage Model-Agnostic Meta-Learning (MAML) for adapting the learned model to the new corpus fast and robustly. Experiments show that the proposed approach significantly outperforms existing methods in constructing accurate embeddings for OOV words, and improves downstream tasks where these embeddings are utilized.
    @inproceedings{hu2019fewshot,
      author = {Hu, Ziniu and Chen, Ting and Chang, Kai-Wei and Sun, Yizhou},
      title = {Few-Shot Representation Learning for Out-Of-Vocabulary Words},
      booktitle = {ACL},
      year = {2019}
    }
    
    Details
  • Learning Word Embeddings for Low-resource Languages by PU Learning

    Chao Jiang, Hsiang-Fu Yu, Cho-Jui Hsieh, and Kai-Wei Chang, in NAACL, 2018.
    Full Text Slides Video Code Abstract BibTeX Details
    Word embedding has been used as a key component in many downstream applications in processing natural languages. Existing approaches often assume the existence of a large collection of text for learning effective word embedding. However, such a corpus may not be available for some low-resource languages. In this paper, we study how to effectively learn a word embedding model on a corpus with only a few million tokens. In such a situation, the co-occurrence matrix is very sparse because many word pairs are not observed to co-occur. In contrast to existing approaches, we argue that the zero entries in the co-occurrence matrix also provide valuable information and design a Positive-Unlabeled Learning (PU-Learning) approach to factorize the co-occurrence matrix. The experimental results demonstrate that the proposed approach requires a smaller amount of training text to obtain a reasonable word embedding model.
    @inproceedings{jiang2018learning,
      author = {Jiang, Chao and Yu, Hsiang-Fu and Hsieh, Cho-Jui and Chang, Kai-Wei},
      title = {Learning Word Embeddings for Low-resource Languages by PU Learning},
      booktitle = {NAACL},
      vimeo_id = {277670013},
      year = {2018}
    }
    
    Details
  • Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment

    Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo, in IJCAI, 2018.
    Full Text Slides Code Abstract BibTeX Details
    Multilingual knowledge graph (KG) embeddings provide latent semantic representations of entities and structured knowledge enabled with cross-lingual inferences that benefit various knowledge-driven cross-lingual NLP tasks. However, precisely learning such cross-lingual inferences is usually hindered by the low coverage of entity alignment in many KGs. Since many multilingual KGs also provide literal descriptions of entities, in this paper, we introduce an embedding-based approach which leverages a weakly aligned multilingual KG for semi-supervised cross-lingual learning using entity descriptions. Our approach performs co-training of two embedding models, i.e. a multilingual KG embedding model and a multilingual literal description embedding model. The models are trained on a large Wikipedia-based trilingual dataset where most entity alignment is unknown to training. Experimental results show that the performance of the proposed approach on the entity alignment task improves at each iteration of co-training, and eventually reaches a stage at which it significantly surpasses previous approaches. We also show that our approach has promising abilities for zero-shot entity alignment, and cross-lingual KG completion.
    @inproceedings{chen2018multilingual,
      author = {Chen, Muhao and Tian, Yingtao and Chang, Kai-Wei and Skiena, Steven and Zaniolo, Carlo},
      title = {Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment},
      booktitle = {IJCAI},
      year = {2018}
    }
    
    Details
  • Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

    Shyam Upadhyay, Kai-Wei Chang, Matt Taddy, Adam Kalai, and James Zou, in ACL RepL4NLP Workshop, 2017.
    Full Text Abstract BibTeX Details Best Paper Award
    Word embeddings, which represent a word as a point in a vector space, have become ubiquitous to several NLP tasks. A recent line of work uses bilingual (two languages) corpora to learn a different vector for each sense of a word, by exploiting crosslingual signals to aid sense identification. We present a multi-view Bayesian non-parametric algorithm which improves multi-sense word embeddings by (a) using multilingual (i.e., more than two languages) corpora to significantly improve sense embeddings beyond what one achieves with bilingual information, and (b) uses a principled approach to learn a variable number of senses per word, in a data-driven manner. Ours is the first approach with the ability to leverage multilingual corpora efficiently for multi-sense representation learning. Experiments show that multilingual training significantly improves performance over monolingual and bilingual training, by allowing us to combine different parallel corpora to leverage multilingual context. Multilingual training yields comparable performance to a state of the art monolingual model trained on five times more training data.
    @inproceedings{upadhyay2017beyond,
      author = {Upadhyay, Shyam and Chang, Kai-Wei and Taddy, Matt and Kalai, Adam and Zou, James},
      title = {Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context},
      booktitle = {ACL RepL4NLP Workshop},
      year = {2017}
    }
    
    Details