Share this page:

CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning

Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, and Kai-Wei Chang, in ICCV, 2023.

Best Paper Award at ICLR Workshop, Oral at ICCV (195 out of 8088 submissions, top 2.5%)

Code

Download the full text


Abstract

Multimodal contrastive pretraining has been used to train multimodal representation models, such as CLIP, on large amounts of paired image-text data. However, previous studies have revealed that such models are vulnerable to backdoor attacks. Specifically, when trained on backdoored examples, CLIP learns spurious correlations between the embedded backdoor trigger and the target label, aligning their representations in the joint embedding space. Injecting even a small number of poisoned examples, such as 75 examples in 3 million pretraining data, can significantly manipulate the model’s behavior, making it difficult to detect or unlearn such correlations. To address this issue, we propose CleanCLIP, a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks by independently re-aligning the representations for individual modalities. We demonstrate that unsupervised finetuning using a combination of multimodal contrastive and unimodal self-supervised objectives for individual modalities can significantly reduce the impact of the backdoor attack. We show empirically that CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.


Bib Entry

@inproceedings{bansal2023cleanclip,
  author = {Bansal, Hritik and Singhi, Nishad and Yang, Yu and Yin, Fan and Grover, Aditya and Chang, Kai-Wei},
  title = {CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning},
  booktitle = {ICCV},
  year = {2023}
}

Related Publications

  1. VideoCon: Robust video-language alignment via contrast captions, CVPR, 2024
  2. Red Teaming Language Model Detectors with Language Models, TACL, 2023
  3. ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation, EMNLP, 2022
  4. Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers, EMNLP-Finding (short), 2022
  5. Unsupervised Syntactically Controlled Paraphrase Generation with Abstract Meaning Representations, EMNLP-Finding (short), 2022
  6. Improving the Adversarial Robustness of NLP Models by Information Bottleneck, ACL-Finding, 2022
  7. Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution, EMNLP, 2021
  8. On the Transferability of Adversarial Attacks against Neural Text Classifier, EMNLP, 2021
  9. Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble, ACL, 2021
  10. Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation, NAACL, 2021
  11. Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs, NeurIPS, 2020
  12. On the Robustness of Language Encoders against Grammatical Errors, ACL, 2020
  13. Robustness Verification for Transformers, ICLR, 2020
  14. Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification, EMNLP, 2019
  15. Retrofitting Contextualized Word Embeddings with Paraphrases, EMNLP (short), 2019
  16. Generating Natural Language Adversarial Examples, EMNLP (short), 2018