On Localizing and Deleting Toxic Memories in Large Language Models
Anubrata Das, Manoj Kumar, Ninareh Mehrabi, Anil Ramakrishna, Anna Rumshisky, Kai-Wei Chang, Aram Galstyan, Morteza Ziyadi, and Rahul Gupta, in NAACL-Finding, 2025.
Abstract
Bib Entry
@inproceedings{das2025localizing,
title = {On Localizing and Deleting Toxic Memories in Large Language Models},
author = {Das, Anubrata and Kumar, Manoj and Mehrabi, Ninareh and Ramakrishna, Anil and Rumshisky, Anna and Chang, Kai-Wei and Galstyan, Aram and Ziyadi, Morteza and Gupta, Rahul},
booktitle = {NAACL-Finding},
year = {2025}
}
Related Publications