Share this page:

On Localizing and Deleting Toxic Memories in Large Language Models

Anubrata Das, Manoj Kumar, Ninareh Mehrabi, Anil Ramakrishna, Anna Rumshisky, Kai-Wei Chang, Aram Galstyan, Morteza Ziyadi, and Rahul Gupta, in NAACL-Finding, 2025.

Download the full text


Abstract


Bib Entry

@inproceedings{das2025localizing,
  title = {On Localizing and Deleting Toxic Memories in Large Language Models},
  author = {Das, Anubrata and Kumar, Manoj and Mehrabi, Ninareh and Ramakrishna, Anil and Rumshisky, Anna and Chang, Kai-Wei and Galstyan, Aram and Ziyadi, Morteza and Gupta, Rahul},
  booktitle = {NAACL-Finding},
  year = {2025}
}

Related Publications