Share this page:

Evaluating the Values of Sources in Transfer Learning

Md Rizwan Parvez and Kai-Wei Chang, in NAACL, 2021.

Code

Download the full text


Abstract

Transfer learning that adapts a model trained on data-rich sources to low-resource targets has been widely applied in natural language processing (NLP). However, when training a transfer model over multiple sources, not every source is equally useful for the target. To better transfer a model, it is essential to understand the values of the sources. In this paper, we develop SEAL-Shap, an efficient source valuation framework for quantifying the usefulness of the sources (e.g., domains/languages) in transfer learning based on the Shapley value method. Experiments and comprehensive analyses on both cross-domain and cross-lingual transfers demonstrate that our framework is not only effective in choosing useful transfer sources but also the source values match the intuitive source-target similarity.



Bib Entry

@inproceedings{parvez2021evaluating,
  title = {Evaluating the Values of Sources in Transfer Learning},
  author = {Parvez, Md Rizwan and Chang, Kai-Wei},
  booktitle = {NAACL},
  presentation_id = {https://underline.io/events/122/sessions/4261/lecture/19707-evaluating-the-values-of-sources-in-transfer-learning},
  year = {2021}
}

Related Publications

  1. Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction

    Kuan-Hao Huang, I.-Hung Hsu, Prem Natarajan, Kai-Wei Chang, and Nanyun Peng, in ACL, 2022.
    Full Text Code Abstract BibTeX Details
    We present a study on leveraging multilingual pre-trained generative language models for zero-shot cross-lingual event argument extraction (EAE). By formulating EAE as a language generation task, our method effectively encodes event structures and captures the dependencies between arguments. We design language-agnostic templates to represent the event argument structures, which are compatible with any language, hence facilitating the cross-lingual transfer. Our proposed model finetunes multilingual pre-trained generative language models to generate sentences that fill in the language-agnostic template with arguments extracted from the input passage. The model is trained on source languages and is then directly applied to target languages for event argument extraction. Experiments demonstrate that the proposed model outperforms the current state-of-the-art models on zero-shot cross-lingual EAE. Comprehensive studies and error analyses are presented to better understand the advantages and the current limitations of using generative language models for zero-shot cross-lingual transfer EAE.
    @inproceedings{huang2022multilingual,
      title = {Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction},
      author = {Huang, Kuan-Hao and Hsu, I-Hung and Natarajan, Prem and Chang, Kai-Wei and Peng, Nanyun},
      booktitle = {ACL},
      year = {2022}
    }
    
    Details
  2. Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

    Kuan-Hao Huang, Wasi Ahmad, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2021.
    Full Text Code Abstract BibTeX Details
    Pre-trained multilingual language encoders, such as multilingual BERT and XLM-R, show great potential for zero-shot cross-lingual transfer. However, these multilingual encoders do not precisely align words and phrases across languages. Especially, learning alignments in the multilingual embedding space usually requires sentence-level or word-level parallel corpora, which are expensive to be obtained for low-resource languages. An alternative is to make the multilingual encoders more robust; when fine-tuning the encoder using downstream task, we train the encoder to tolerate noise in the contextual embedding spaces such that even if the representations of different languages are not aligned well, the model can still achieve good performance on zero-shot cross-lingual transfer. In this work, we propose a learning strategy for training robust models by drawing connections between adversarial examples and the failure cases of zero-shot cross-lingual transfer. We adopt two widely used robust training methods, adversarial training and randomized smoothing, to train the desired robust model. The experimental results demonstrate that robust training improves zero-shot cross-lingual transfer on text classification tasks. The improvement is more significant in the generalized cross-lingual transfer setting, where the pair of input sentences belong to two different languages.
    @inproceedings{huang2021improving,
      title = {Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training},
      author = {Huang, Kuan-Hao and Ahmad, Wasi and Peng, Nanyun and Chang, Kai-Wei},
      presentation_id = {https://underline.io/events/192/posters/7783/poster/40656-improving-zero-shot-cross-lingual-transfer-learning-via-robust-training},
      booktitle = {EMNLP},
      year = {2021}
    }
    
    Details
  3. Syntax-augmented Multilingual BERT for Cross-lingual Transfer

    Wasi Ahmad, Haoran Li, Kai-Wei Chang, and Yashar Mehdad, in ACL, 2021.
    Full Text Video Code Abstract BibTeX Details
    In recent years, we have seen a colossal effort
    in pre-training multilingual text encoders using large-scale corpora in many languages to
    facilitate cross-lingual transfer learning. However, due to typological differences across languages, the cross-lingual transfer is challenging. Nevertheless, language syntax, e.g., syntactic dependencies, can bridge the typological gap. Previous works have shown that pretrained multilingual encoders, such as mBERT
    (Devlin et al., 2019), capture language syntax, helping cross-lingual transfer. This work
    shows that explicitly providing language syntax and training mBERT using an auxiliary
    objective to encode the universal dependency
    tree structure helps cross-lingual transfer. We
    perform rigorous experiments on four NLP
    tasks, including text classification, question answering, named entity recognition, and taskoriented semantic parsing. The experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks, such as PAWS-X and MLQA, by 1.4
    and 1.6 points on average across all languages.
    In the generalized transfer setting, the performance boosted significantly, with 3.9 and 3.1
    points on average in PAWS-X and MLQA.
    @inproceedings{ahmad2021syntax,
      title = {Syntax-augmented Multilingual BERT for Cross-lingual Transfer},
      author = {Ahmad, Wasi and Li, Haoran and Chang, Kai-Wei and Mehdad, Yashar},
      booktitle = {ACL},
      year = {2021}
    }
    
    Details
  4. Evaluating the Values of Sources in Transfer Learning

    Md Rizwan Parvez and Kai-Wei Chang, in NAACL, 2021.
    Full Text Video Code Abstract BibTeX Details
    Transfer learning that adapts a model trained on data-rich sources to low-resource targets has been widely applied in natural language processing (NLP). However, when training a transfer model over multiple sources, not every source is equally useful for the target. To better transfer a model, it is essential to understand the values of the sources. In this paper, we develop SEAL-Shap, an efficient source valuation framework for quantifying the usefulness of the sources (e.g., domains/languages) in transfer learning based on the Shapley value method. Experiments and comprehensive analyses on both cross-domain and cross-lingual transfers demonstrate that our framework is not only effective in choosing useful transfer sources but also the source values match the intuitive source-target similarity.
    @inproceedings{parvez2021evaluating,
      title = {Evaluating the Values of Sources in Transfer Learning},
      author = {Parvez, Md Rizwan and Chang, Kai-Wei},
      booktitle = {NAACL},
      presentation_id = {https://underline.io/events/122/sessions/4261/lecture/19707-evaluating-the-values-of-sources-in-transfer-learning},
      year = {2021}
    }
    
    Details
  5. GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction

    Wasi Ahmad, Nanyun Peng, and Kai-Wei Chang, in AAAI, 2021.
    Full Text Code Abstract BibTeX Details
    Prevalent approaches in cross-lingual relation and event extraction use graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic representations such that models trained on one language can be applied to other languages. However, GCNs lack in modeling long-range dependencies or disconnected words in the dependency tree. To address this challenge, we propose to utilize the self-attention mechanism where we explicitly fuse structural information to learn the dependencies between words at different syntactic distances. We introduce GATE, a \bf Graph \bf Attention \bf Transformer \bf Encoder, and test its cross-lingual transferability on relation and event extraction tasks. We perform rigorous experiments on the widely used ACE05 dataset that includes three typologically different languages: English, Chinese, and Arabic. The evaluation results show that GATE outperforms three recently proposed methods by a large margin. Our detailed analysis reveals that due to the reliance on syntactic dependencies, GATE produces robust representations that facilitate transfer across languages.
    @inproceedings{ahmad2021gate,
      author = {Ahmad, Wasi and Peng, Nanyun and Chang, Kai-Wei},
      title = {GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction},
      booktitle = {AAAI},
      year = {2021}
    }
    
    Details
  6. Cross-Lingual Dependency Parsing by POS-Guided Word Reordering

    Lu Liu, Yi Zhou, Jianhan Xu, Xiaoqing Zheng, Kai-Wei Chang, and Xuanjing Huang, in EMNLP-Finding, 2020.
    Full Text Abstract BibTeX Details
    We propose a novel approach to cross-lingual dependency parsing based on word reordering. The words in each sentence of a source language corpus are rearranged to meet the word order in a target language under the guidance of a part-of-speech based language model (LM). To obtain the highest reordering score under the LM, a population-based optimization algorithm and its genetic operators are designed to deal with the combinatorial nature of such word reordering. A parser trained on the reordered corpus then can be used to parse sentences in the target language. We demonstrate through extensive experimentation that our approach achieves better or comparable results across 25 target languages (1.73% increase in average), and outperforms a baseline by a significant margin on the languages that are greatly different from the source one. For example, when transferring the English parser to Hindi and Latin, our approach outperforms the baseline by 15.3% and 6.7% respectively.
    @inproceedings{liu2020cross-lingual,
      author = {Liu, Lu and Zhou, Yi and Xu, Jianhan and Zheng, Xiaoqing and Chang, Kai-Wei and Huang, Xuanjing},
      title = {Cross-Lingual Dependency Parsing by POS-Guided Word Reordering},
      booktitle = {EMNLP-Finding},
      year = {2020}
    }
    
    Details
  7. Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages

    Wasi Ahmad, Zhisong Zhang, Xuezhe Ma, Kai-Wei Chang, and Nanyun Peng, in CoNLL, 2019.
    Full Text Poster Code Abstract BibTeX Details
    Cross-lingual transfer learning has become an important weapon to battle the unavailability of annotated resources for low-resource languages.  One of the fundamental techniques to transfer across languages is learning language-agnostic representations, in the form of word embeddings or contextual encodings. In this work, we propose to leverage unannotated sentences from auxiliary languages to help learning language-agnostic representations  Specifically, we explore adversarial training for learning contextual encoders that produce invariant representations across languages to facilitate cross-lingual transfer. We conduct experiments on cross-lingual dependency parsing where we train a dependency parser on a source language and transfer it to a wide range of target languages.  Experiments on 28 target languages demonstrate that adversarial training significantly improves the overall transfer performances under several different settings.  We conduct a careful analysis to evaluate the language-agnostic representations resulted from adversarial training.  
    @inproceedings{ahmad2019crosslingual,
      author = {Ahmad, Wasi and Zhang, Zhisong and Ma, Xuezhe and Chang, Kai-Wei and Peng, Nanyun},
      title = {  Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages},
      booktitle = {CoNLL},
      year = {2019}
    }
    
    Details
  8. Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing

    Tao Meng, Nanyun Peng, and Kai-Wei Chang, in EMNLP, 2019.
    Full Text Poster Code Abstract BibTeX Details
    Prior work on cross-lingual dependency parsing often focuses on capturing the commonalities between source and target languages and overlooks the potential of leveraging linguistic properties of the languages to facilitate the transfer. In this paper, we show that weak supervisions of linguistic knowledge for the target languages can improve a cross-lingual graph-based dependency parser substantially. Specifically, we explore several types of corpus linguistic statistics and compile them into corpus-wise constraints to guide the inference process during the test time. We adapt two techniques, Lagrangian relaxation and posterior regularization, to conduct inference with corpus-statistics constraints. Experiments show that the Lagrangian relaxation and posterior regularization inference improve the performances on 15 and 17 out of 19 target languages, respectively. The improvements are especially significant for target languages that have different word order features from the source language.
    @inproceedings{meng2019target,
      author = {Meng, Tao and Peng, Nanyun and Chang, Kai-Wei},
      title = {Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing},
      booktitle = {EMNLP},
      year = {2019}
    }
    
    Details
  9. On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

    Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, and Nanyun Peng, in NAACL, 2019.
    Full Text Video Code Abstract BibTeX Details
    Different languages might have different wordorders. In this paper, we investigate cross-lingual transfer and posit that an order-agnostic model will perform better when trans-ferring to distant foreign languages. To test ourhypothesis, we train dependency parsers on anEnglish corpus and evaluate their transfer per-formance on 30 other languages. Specifically,we compare encoders and decoders based onRecurrent Neural Networks (RNNs) and mod-ified self-attentive architectures. The formerrelies on sequential information while the lat-ter is more flexible at modeling word order.Rigorous experiments and detailed analysisshows that RNN-based architectures transferwell to languages that are close to English,while self-attentive models have better overallcross-lingual transferability and perform espe-cially well on distant languages.
    @inproceedings{ahmad2019difficulties,
      author = {Ahmad, Wasi Uddin and Zhang, Zhisong and Ma, Xuezhe and Hovy, Eduard and Chang, Kai-Wei and Peng, Nanyun},
      title = {On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing},
      booktitle = {NAACL},
      year = {2019}
    }
    
    Details