From Narrow Unlearning to Emergent Misalignment in LLMs
Erum Mushtaq, Anil Ramakrishna, Satyapriya Krishna, Sattvik Sahai, Prasoon Goyal, Kai-Wei Chang, Tao Zhang, and Rahul Gupta, in ACL, 2026.
Abstract
Bib Entry
@inproceedings{mushtaq2026narrow,
title = {From Narrow Unlearning to Emergent Misalignment in LLMs},
author = {Mushtaq, Erum and Ramakrishna, Anil and Krishna, Satyapriya and Sahai, Sattvik and Goyal, Prasoon and Chang, Kai-Wei and Zhang, Tao and Gupta, Rahul},
booktitle = {ACL},
year = {2026}
}
Related Publications
-
BLUR: A Bi-Level Optimization Approach for LLM Unlearning, EACL, 2026
-
Not Every Token Needs Forgetting: Selective Unlearning to Limit Change in Utility in Large Language Model Unlearning, EMNLP-Finding, 2025
-
LUME: LLM Unlearning with Multitask Evaluations, EMNLP-Finding, 2025
-
Unlearning as Multi-task Optimization: A Normalized Gradient Difference Approach with an Adaptive Learning Rate, NAACL, 2025