Share this page:

On the Transferability of Adversarial Attacks against Neural Text Classifier

Liping Yuan, Xiaoqing Zheng, Yi Zhou, Cho-Jui Hsieh, and Kai-Wei Chang, in EMNLP, 2021.

Download the full text


Abstract

Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we present the first study to systematically investigate the transferability of adversarial examples for text classification models and explore how various factors, including network architecture, tokenization scheme, word embedding, and model capacity, affect the transferability of adversarial examples. Based on these studies, we propose a genetic algorithm to find an ensemble of models that can be used to induce adversarial examples to fool almost all existing models. Such adversarial examples reflect the defects of the learning process and the data bias in the training set. Finally, we derive word replacement rules that can be used for model diagnostics from these adversarial examples.


Bib Entry

@inproceedings{yuan2021on,
  title = {On the Transferability of Adversarial Attacks against Neural Text Classifier},
  author = {Yuan, Liping and Zheng, Xiaoqing and Zhou, Yi and Hsieh, Cho-Jui and Chang, Kai-Wei},
  presentation_id = {https://underline.io/events/192/posters/8223/poster/38067-on-the-transferability-of-adversarial-attacks-against-neural-text-classifier},
  booktitle = {EMNLP},
  year = {2021}
}

Related Publications

  1. VideoCon: Robust video-language alignment via contrast captions, CVPR, 2024
  2. CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning, ICCV, 2023
  3. Red Teaming Language Model Detectors with Language Models, TACL, 2023
  4. ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation, EMNLP, 2022
  5. Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers, EMNLP-Finding (short), 2022
  6. Unsupervised Syntactically Controlled Paraphrase Generation with Abstract Meaning Representations, EMNLP-Finding (short), 2022
  7. Improving the Adversarial Robustness of NLP Models by Information Bottleneck, ACL-Finding, 2022
  8. Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution, EMNLP, 2021
  9. Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble, ACL, 2021
  10. Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation, NAACL, 2021
  11. Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs, NeurIPS, 2020
  12. On the Robustness of Language Encoders against Grammatical Errors, ACL, 2020
  13. Robustness Verification for Transformers, ICLR, 2020
  14. Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification, EMNLP, 2019
  15. Retrofitting Contextualized Word Embeddings with Paraphrases, EMNLP (short), 2019
  16. Generating Natural Language Adversarial Examples, EMNLP (short), 2018