Share this page:

VideoCon: Robust video-language alignment via contrast captions

Hritik Bansal, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang, and Aditya Grover, in CVPR, 2024.

Best paper at DPFM workshop at ICLR

Code Demo

Download the full text


Abstract

Despite being (pre)trained on a massive amount of data, state-of-the-art video-language alignment models are not robust to semantically-plausible contrastive changes in the video captions. Our work addresses this by identifying a broad spectrum of contrast misalignments, such as replacing entities, actions, and flipping event order, which alignment models should be robust against. To this end, we introduce the VideoCon, a video-language alignment dataset constructed by a large language model that generates plausible contrast video captions and explanations for differences between original and contrast video captions. Then, a generative video-language model is finetuned with VideoCon to assess video-language entailment and generate explanations. Our VideoCon-based alignment model significantly outperforms current models. It exhibits a 12-point increase in AUC for the video-language alignment task on human-generated contrast captions. Finally, our model sets new state of the art zero-shot performance in temporally-extensive video-language tasks such as text-to-video retrieval (SSv2-Temporal) and video question answering (ATP-Hard). Moreover, our model shows superior performance on novel videos and human-crafted captions and explanations.



Bib Entry

@inproceedings{bansal2023videocon,
  author = {Bansal, Hritik and Bitton, Yonatan and Szpektor, Idan and Chang, Kai-Wei and Grover, Aditya},
  title = {VideoCon: Robust video-language alignment via contrast captions},
  booktitle = {CVPR},
  year = {2024}
}

Related Publications

  1. CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning, ICCV, 2023
  2. Red Teaming Language Model Detectors with Language Models, TACL, 2023
  3. ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation, EMNLP, 2022
  4. Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers, EMNLP-Finding (short), 2022
  5. Unsupervised Syntactically Controlled Paraphrase Generation with Abstract Meaning Representations, EMNLP-Finding (short), 2022
  6. Improving the Adversarial Robustness of NLP Models by Information Bottleneck, ACL-Finding, 2022
  7. Searching for an Effiective Defender: Benchmarking Defense against Adversarial Word Substitution, EMNLP, 2021
  8. On the Transferability of Adversarial Attacks against Neural Text Classifier, EMNLP, 2021
  9. Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble, ACL, 2021
  10. Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation, NAACL, 2021
  11. Provable, Scalable and Automatic Perturbation Analysis on General Computational Graphs, NeurIPS, 2020
  12. On the Robustness of Language Encoders against Grammatical Errors, ACL, 2020
  13. Robustness Verification for Transformers, ICLR, 2020
  14. Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification, EMNLP, 2019
  15. Retrofitting Contextualized Word Embeddings with Paraphrases, EMNLP (short), 2019
  16. Generating Natural Language Adversarial Examples, EMNLP (short), 2018