The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention
Yixin Wan, Di Wu, Haoran Wang, and Kai-Wei Chang, in EMNLP, 2024.
Download the full text
Abstract
Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures? In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematically quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. DoFaiR consists of 756 meticulously fact-checked test instances to reveal the factuality tax of various diversity prompts through an automated evidence-supported evaluation pipeline. Experiments on DoFaiR unveil that diversity-oriented instructions increase the number of different gender and racial groups in DALLE-3’s generations at the cost of historically inaccurate demographic distributions. To resolve this issue, we propose Fact-Augmented Intervention (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history, and incorporate it into the generation context of T2I models. By orienting model generations using the reflected historical truths, FAI significantly improves the demographic factuality under diversity interventions while preserving diversity.
Bib Entry
@inproceedings{wan2024factuality,
title = {The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention},
author = {Wan, Yixin and Wu, Di and Wang, Haoran and Chang, Kai-Wei},
booktitle = {EMNLP},
year = {2024}
}
Related Publications
- Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases, EMNLP-Finding, 2025
- The Male CEO and the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects, ACL, 2025
- JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images, NeurIPS (Datasets and Benchmarks Track), 2024
- MACAROON: Training Vision-Language Models To Be Your Engaged Partners, EMNLP-Finding, 2024
- Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond, EMNLP-Findings, 2023
- Resolving Ambiguities in Text-to-Image Generative Models, ACL, 2023
- UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding, ACL-Finding, 2023