Learning Bilingual Word Embeddings Using Lexical Definitions

Weijia Shi, Muhao Chen, Yingtao Tian, and Kai-Wei Chang, in Repl4NLP (ACL workshop), 2019.


[Full Text]

Abstract

Bilingual word embeddings, which represent lexicons of different languages in a shared embedding space, are essential for supporting semantic and knowledge transfers in a variety of cross-lingual NLP tasks. Existing approaches to training bilingual word embeddings require either large collections of pre-defined seed lexicons that are expensive to obtain, or parallel sentences that comprise coarse and noisy alignment. In contrast, we propose BiLex that leverages publicly available lexical definitions for bilingual word embedding learning. Without the need of predefined seed lexicons, BiLex comprises a novel word pairing strategy to automatically identify and propagate the precise fine-grain word alignment from lexical definitions. We evaluate BiLex in word-level and sentence-level translation tasks, which seek to find the cross-lingual counterparts of words and sentences respectively. BiLex significantly outperforms previous embedding methods on both tasks.

Bib Entry

@inproceedings{shi2019bilingual,
  author = {Shi, Weijia and Chen, Muhao and Tian, Yingtao and Chang, Kai-Wei},
  title = {Learning Bilingual Word Embeddings Using Lexical Definitions},
  booktitle = {Repl4NLP (ACL workshop)},
  poster = {/documents/slides/shi2019bilingual_poster.pdf},
  year = {2019}
}

Links