MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Knowledge Poisoning Attacks
Hyeonjeong Ha, Qiusi Zhan, Jeonghwan Kim, Dimitrios Bralios, Saikrishna Sanniboina, Nanyun Peng, Kai-Wei Chang, Daniel Kang, and Heng Ji, in ACL, 2026.
CodeDownload the full text
Abstract
Multimodal large language models with Retrieval Augmented Generation (RAG) have significantly advanced tasks such as multimodal question answering by grounding responses in external text and images. This grounding improves factuality, reduces hallucination, and extends reasoning beyond parametric knowledge. However, this reliance on external knowledge poses a critical yet underexplored safety risk: knowledge poisoning attacks, where adversaries deliberately inject adversarial multimodal content into external knowledge bases to steer model toward generating incorrect or even harmful responses. To expose such vulnerabilities, we propose MM-PoisonRAG, the first framework to systematically design knowledge poisoning in multimodal RAG. We introduce two complementary attack strategies: Localized Poisoning Attack (LPA), which implants targeted multimodal misinformation to manipulate specific queries, and Globalized Poisoning Attack (GPA), which inserts a single adversarial knowledge to broadly disrupt reasoning and induce nonsensical responses across all queries. Comprehensive experiments across tasks, models, and access settings show that LPA achieves targeted manipulation with attack success rates of up to 56%, while GPA completely disrupts model generation to 0% accuracy with just a single adversarial knowledge injection. Our results reveal the fragility of multimodal RAG and highlight the urgent need for defenses against knowledge poisoning.
Bib Entry
@inproceedings{ha2026mmpoisonrag,
title = {MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Knowledge Poisoning Attacks},
author = {Ha, Hyeonjeong and Zhan, Qiusi and Kim, Jeonghwan and Bralios, Dimitrios and Sanniboina, Saikrishna and Peng, Nanyun and Chang, Kai-Wei and Kang, Daniel and Ji, Heng},
booktitle = {ACL},
year = {2026}
}
Related Publications
- SWAN: Semantic Watermarking with Abstract Meaning Representation, ACL, 2026
- Mitigating Over-Refusal in Aligned Large Language Models via Inference-Time Activation Energy, ACL, 2026
- ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System, ACL, 2026
- Open-Domain Safety Policy Construction, EACL-Findings, 2026
- Customize Multi-modal RAI Guardrails with Precedent-based predictions, COLM 2025, 2025
- X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents, COLM 2025, 2025
- Vulnerability of LLMs to Vertically Aligned Text Manipulations, ACL, 2025
- Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models, CVPR, 2025
- Vulnerability of Large Language Models to Output Prefix Jailbreaks: Impact of Positions on Safety, NAACL-Finding, 2025
- SafeWorld: Geo-Diverse Safety Alignment, NeurIPS, 2024
- FLIRT: Feedback Loop In-context Red Teaming, EMNLP, 2024
- Data Advisor: Data Curation with Foresight for Safety Alignment of Large Language Models, EMNLP, 2024
- Prompt-Driven LLM Safeguarding via Directed Representation Optimization, ICML, 2024