Representation Learning for Resource-Constrained Keyphrase Generation

Di Wu, Wasi Uddin Ahmad, Sunipa Dev, and Kai-Wei Chang, in EMNLP-Finding, 2022.

Code

Download the full text

Abstract

State-of-the-art keyphrase generation methods generally depend on large annotated datasets, limiting their performance in domains with limited annotated data. To overcome this challenge, we design a data-oriented approach that first identifies salient information using unsupervised corpus-level statistics, and then learns a task-specific intermediate representation based on a pre-trained language model. We introduce salient span recovery and salient span prediction as denoising training objectives that condense the intra-article and inter-article knowledge essential for keyphrase generation. Through experiments on multiple keyphrase generation benchmarks, we show the effectiveness of the proposed approach for facilitating low-resource and zero-shot keyphrase generation. We further observe that the method especially benefits the generation of absent keyphrases, approaching the performance of models trained with large training sets.

Source Code

Bib Entry

@inproceedings{wu2022representation,
  title = {Representation Learning for Resource-Constrained Keyphrase Generation},
  author = {Wu, Di and Ahmad, Wasi Uddin and Dev, Sunipa and Chang, Kai-Wei},
  booktitle = {EMNLP-Finding},
  year = {2022}
}

Related Publications

MetaKP: On-Demand Keyphrase Generation, EMNLP-Finding, 2024
KPEval: Towards Fine-Grained Semantic-Based Keyphrase Evaluation, ACL-Findings, 2024
On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation, LREC-COLING, 2024
Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models, EMNLP, 2023
Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention, ACL, 2021