Ctrl-R was selected as a spotlight at ICML 2026.
Research at a glance
Multimodal Foundation Models
We developed VisualBERT, one of the first vision-language models, and follow-ups, including GLIP and DesCo that can recognize objects through detailed language descriptions. Our recent work extends this line to robust video-language alignment and multimodal reasoning; for example, our 7B multimodal reasoning model outperforms GPT-4o on a visual math benchmark.
AI Agents for Science
We build agentic systems that synthesize literature, reason over evidence, use tools, and support scientific discovery with human oversight. This includes compositional tool-use systems for scientific question answering and benchmarks such as MathVista that are widely used by frontier labs to evaluate visual mathematical reasoning.
Reasoning in LLMs
We developed methods enabling LLMs to adhere to specified constraints and study commonsense, mathematical, logical, and structured reasoning capabilities. Recent projects include controllable generation, open reasoning data recipes, and trajectory-control methods for stronger reasoning models.
Trustworthy AI
We published pioneering work on aligning AI systems with human values and safety, focusing on fairness in generative AI, robustness, unlearning, and safety in LLMs and multimodal systems. Recent work includes selective unlearning for removing sensitive content and agentic safety reasoning. I also organize Trustworthy NLP workshops at ACL* since 2020.
Highlighted papers and projects