BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression
Yuankai Li, Jia-Chen Gu, Di Wu, Kai-Wei Chang, and Nanyun Peng, in NAACL-Finding, 2025.
CodeDownload the full text
Abstract
Retrieval-augmented generation (RAG) can supplement large language models (LLMs) by integrating external knowledge. However, as the number of retrieved documents increases, the input length to LLMs grows linearly, causing a dramatic increase in latency and a degradation in long-context understanding. This is particularly serious for multi-hop questions that require a chain of reasoning across documents. To accelerate inference, reduce costs, and minimize distractions, this paper presents BRIEF (Bridging Retrieval and Inference through Evidence Fusion), a lightweight approach that performs query-aware multi-hop reasoning by compressing retrieved documents into highly dense textual summaries to integrate into in-context RAG. To enable learning compression for multi-hop reasoning, we curate synthetic data by extracting atomic propositions that encapsulate distinct factoids from the source documents to compose synthetic summaries. Based on our synthetic data built entirely by open-source models, BRIEF generates more concise summaries and enables a range of LLMs to achieve exceptional open-domain question answering (QA) performance. For example, on HotpotQA, BRIEF improves the compression rate by 2 times compared to the state-of-the-art baseline, while outperforming it by 3.00% EM and 4.16% F1 with Flan-UL2 as the reader model. It also generates more concise summaries than proprietary GPT-3.5, while demonstrating nearly identical QA performance.
Bib Entry
@inproceedings{li2025brief,
title = {BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression},
author = {Li, Yuankai and Gu, Jia-Chen and Wu, Di and Chang, Kai-Wei and Peng, Nanyun},
booktitle = {NAACL-Finding},
year = {2025}
}
Related Publications
- Learning Structured Reasoning via Tractable Trajectory Control, ICML, 2026
- Training LLMs for Divide-and-Conquer Reasoning, ACL, 2026
- BRIEF-Pro: Universal Context Compression with Short-to-Long Synthesis for Fast and Accurate Multi-Hop Reasoning, ACL-Findings, 2026
- Beyond Facts: Benchmarking Distributional Reading Comprehension in Large Language Models, ACL-Findings, 2026
- MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations, ICLR, 2025
- Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation, ACL-Findings, 2025
- QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search, ICML, 2025
- DRS: Deep Question Reformulation With Structured Output, ACL-Findings, 2025
- V-ALPHASOCIAL: Benchmark and Self-Reflective Chain-of-Thought Generation for Visual Social Commonsense Reasoning, ACL-Findings, 2025
- VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning, CVPR, 2025
- QUDSELECT: Selective Decoding for Questions Under Discussion Parsing, EMNLP, 2024
- Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue, EMNLP, 2024
- LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning, EMNLP-Finding, 2024
- Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs, ACL, 2024
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data, ACL-Findings, 2024
- Can small language models help large language models reason better?: LM-guided chain-of-thought, LREC-COLING, 2024
- IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models, EMNLP-Finding, 2023