TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction

Kuan-Hao Huang, I.-Hung Hsu, Tanmay Parekh, Zhiyu Xie, Zixuan Zhang, Prem Natarajan, Kai-Wei Chang, Nanyun Peng, and Heng Ji, in ACL-Findings, 2024.

Download the full text

Abstract

Bib Entry

@inproceedings{huang2024textee,
  title = {TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction},
  author = {Huang, Kuan-Hao and Hsu, I-Hung and Parekh, Tanmay and Xie, Zhiyu and Zhang, Zixuan and Natarajan, Prem and Chang, Kai-Wei and Peng, Nanyun and Ji, Heng},
  booktitle = {ACL-Findings},
  year = {2024},
  abstrct = {Event extraction has gained considerable interest due to its wide-ranging applications. However, recent studies draw attention to evaluation issues, suggesting that reported scores may not accurately reflect the true performance. In this work, we identify and address evaluation challenges, including inconsistency due to varying data assumptions or preprocessing steps, the insufficiency of current evaluation frameworks that may introduce dataset or data split bias, and the low reproducibility of some previous approaches. To address these challenges, we present TextEE, a standardized, fair, and reproducible benchmark for event extraction. TextEE comprises standardized data preprocessing scripts and splits for 14 datasets spanning seven diverse domains and includes 14 recent methodologies, conducting a comprehensive benchmark reevaluation. We also evaluate five varied large language models on our TextEE benchmark and demonstrate how they struggle to achieve satisfactory performance. Inspired by our reevaluation results and findings, we discuss the role of event extraction in the current NLP era, as well as future challenges and insights derived from TextEE. We believe TextEE, the first standardized comprehensive benchmarking tool, will significantly facilitate future event extraction research.}
}

Related Publications

DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning, EMNLP, 2025
SNaRe: Domain-aware Data Generation for Low-Resource Event Detection, EMNLP, 2025
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory, ICLR, 2025
SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness, EMNLP, 2024
Event Detection from Social Media for Epidemic Prediction, NAACL, 2024
GENEVA: Pushing the Limit of Generalizability for Event Argument Extraction with 100+ Event Types, ACL, 2023
TAGPRIME: A Unified Framework for Relational Structure Extraction, ACL, 2023
Enhancing Unsupervised Semantic Parsing with Distributed Contextual Representations, ACL-Finding, 2023
DEGREE: A Data-Efficient Generative Event Extraction Model, NAACL, 2022
Intent Classification and Slot Filling for Privacy Policies, ACL, 2021