AutoSUIT Bench - Automated Security UnIt Test Benchmark for LLM Coding

Samuel Osebe, Fan Yang, Junyi Li, Yue Gu, Yongxin Wang, Satyapriya Krishna, Kai-Wei Chang, Aram Galstyan, Rahul Gupta, and Weitong Ruan, in ACL-Findings, 2026.

Abstract

Bib Entry

@inproceedings{osebe2026autosuit,
  title = {AutoSUIT Bench - Automated Security UnIt Test Benchmark for LLM Coding},
  author = {Osebe, Samuel and Yang, Fan and Li, Junyi and Gu, Yue and Wang, Yongxin and Krishna, Satyapriya and Chang, Kai-Wei and Galstyan, Aram and Gupta, Rahul and Ruan, Weitong},
  booktitle = {ACL-Findings},
  year = {2026}
}

Related Publications

METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling, ACL, 2025
MQT-LLaVA: Matryoshka Query Transformer for Large Vision-Language Models, NeurIPS, 2024
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation, NeurIPS (Datasets and Benchmarks Track), 2024
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs, EMNLP-Finding, 2024
AVATAR: A Parallel Corpus for Java-Python Program Translation, ACL-Finding (short), 2023
Retrieval Augmented Code Generation and Summarization, EMNLP-Finding, 2021
Unified Pre-training for Program Understanding and Generation, NAACL, 2021