Share this page:

Improving the Adversarial Robustness of NLP Models by Information Bottleneck

Cenyuan Zhang, Xiang Zhou, Yixin Wan, Xiaoqing Zheng, Kai-Wei Chang, and Cho-Jui Hsieh, in ACL-Finding, 2022.

Download the full text


Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.

Bib Entry

  title = {Improving the Adversarial Robustness of NLP Models by Information Bottleneck},
  author = {Zhang, Cenyuan and Zhou, Xiang and Wan, Yixin and Zheng, Xiaoqing and Chang, Kai-Wei and Hsieh, Cho-Jui},
  booktitle = {ACL-Finding},
  year = {2022}

Related Publications