On the Transferability of Adversarial Attacks against Neural Text Classifier

Liping Yuan, Xiaoqing Zheng, Yi Zhou, Cho-Jui Hsieh, and Kai-Wei Chang, in EMNLP, 2021.

Download the full text

Abstract

Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we present the first study to systematically investigate the transferability of adversarial examples for text classification models and explore how various factors, including network architecture, tokenization scheme, word embedding, and model capacity, affect the transferability of adversarial examples. Based on these studies, we propose a genetic algorithm to find an ensemble of models that can be used to induce adversarial examples to fool almost all existing models. Such adversarial examples reflect the defects of the learning process and the data bias in the training set. Finally, we derive word replacement rules that can be used for model diagnostics from these adversarial examples.

Bib Entry

@inproceedings{yuan2021on,
  title = {On the Transferability of Adversarial Attacks against Neural Text Classifier},
  author = {Yuan, Liping and Zheng, Xiaoqing and Zhou, Yi and Hsieh, Cho-Jui and Chang, Kai-Wei},
  presentation_id = {https://underline.io/events/192/posters/8223/poster/38067-on-the-transferability-of-adversarial-attacks-against-neural-text-classifier},
  booktitle = {EMNLP},
  year = {2021}
}

Related Publications

VideoCon: Robust video-language alignment via contrast captions, CVPR, 2024
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning, ICCV, 2023
Red Teaming Language Model Detectors with Language Models, TACL, 2023
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation, EMNLP, 2022
Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers, EMNLP-Finding (short), 2022
Unsupervised Syntactically Controlled Paraphrase Generation with Abstract Meaning Representations, EMNLP-Finding (short), 2022
Improving the Adversarial Robustness of NLP Models by Information Bottleneck, ACL-Finding, 2022
Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution, EMNLP, 2021
Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble, ACL, 2021
Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation, NAACL, 2021
Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs, NeurIPS, 2020
On the Robustness of Language Encoders against Grammatical Errors, ACL, 2020
Robustness Verification for Transformers, ICLR, 2020
Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification, EMNLP, 2019
Retrofitting Contextualized Word Embeddings with Paraphrases, EMNLP (short), 2019
Generating Natural Language Adversarial Examples, EMNLP (short), 2018