Customize Multi-modal RAI Guardrails with Precedent-based predictions
Cheng-Fu Yang, Thanh Tran, Christos Christodoulopoulos, Weitong Ruan, Rahul Gupta, and Kai-Wei Chang, in COLM 2025, 2025.
Abstract
Bib Entry
@inproceedings{yang2025customize,
title = {Customize Multi-modal RAI Guardrails with Precedent-based predictions},
author = {Yang, Cheng-Fu and Tran, Thanh and Christodoulopoulos, Christos and Ruan, Weitong and Gupta, Rahul and Chang, Kai-Wei},
booktitle = {COLM 2025},
year = {2025}
}
Related Publications
-
Open-Domain Safety Policy Construction, EACL-Findings, 2026
-
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents, COLM 2025, 2025
-
Vulnerability of LLMs to Vertically Aligned Text Manipulations, ACL, 2025
-
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models, CVPR, 2025
-
Vulnerability of Large Language Models to Output Prefix Jailbreaks: Impact of Positions on Safety, NAACL-Finding, 2025
-
SafeWorld: Geo-Diverse Safety Alignment, NeurIPS, 2024
-
FLIRT: Feedback Loop In-context Red Teaming, EMNLP, 2024
-
Data Advisor: Data Curation with Foresight for Safety Alignment of Large Language Models, EMNLP, 2024
-
Prompt-Driven LLM Safeguarding via Directed Representation Optimization, ICML, 2024