Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, Kai-Wei Chang

EMNLP (short) 2018


Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the network to misclassify. In the image domain, these perturbations can often be made virtually indistinguishable to human perception, causing humans and state-of-the-art models to disagree. However, in the natural language domain, small perturbations are clearly perceptible, and the replacement of a single word can drastically alter the semantics of the document. Given these challenges, we use a population-based optimization algorithm to generate semantically and syntactically similar adversarial examples. We demonstrate via a human study that 94.3% of the generated examples are classified to the original label by human evaluators, and that the examples are perceptibly quite similar. We hope our findings encourage researchers to pursue improving the robustness of DNNs in the natural language domain. ### Bib entry > @inproceedings{ASEHSC18,
> author = {Moustafa Alzantot and Yash Sharma and Ahmed Elgohary and Bo-Jhang Ho and Mani Srivastava and Kai-Wei Chang}, > title = {Generating Natural Language Adversarial Examples},
> booktitle = {EMNLP (short)},
> year = {2018},
> }