Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, and Jianfeng Gao, in NeurIPS, 2023.

Code

Download the full text

Abstract

Large language models (LLMs) have achieved remarkable progress in solving various natural language processing tasks due to emergent reasoning abilities. However, LLMs have inherent limitations as they are incapable of accessing up-to-date information (stored on the Web or in task-specific knowledge bases), using external tools, and performing precise mathematical and logical reasoning. In this paper, we present Chameleon, an AI system that mitigates these limitations by augmenting LLMs with plug-and-play modules for compositional reasoning. Chameleon synthesizes programs by composing various tools (e.g., LLMs, off-the-shelf vision models, web search engines, Python functions, and heuristic-based modules) for accomplishing complex reasoning tasks. At the heart of Chameleon is an LLM-based planner that assembles a sequence of tools to execute to generate the final response. We showcase the effectiveness of Chameleon on two multi-modal knowledge-intensive reasoning tasks: ScienceQA and TabMWP. Chameleon, powered by GPT-4, achieves an 86.54% overall accuracy on ScienceQA, improving the best published few-shot result by 11.37%. On TabMWP, GPT-4-powered Chameleon improves the accuracy by 17.0%, lifting the state of the art to 98.78%. Our analysis also shows that the GPT-4-powered planner exhibits more consistent and rational tool selection via inferring potential constraints from instructions, compared to a ChatGPT-powered planner.

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Chameleon with GPT-4 achieves an 86.54% accuracy on ScienceQA, significantly improving upon the best published few-shot model by 11.37%; using GPT-4 as the underlying LLM, Chameleon achieves a 17.8%… pic.twitter.com/Z31Qg0ZUXI
— AK (@_akhaliq) April 20, 2023

Source Code

Bib Entry

@inproceedings{lu2023chameleon,
  author = {Lu, Pan and Peng, Baolin and Cheng, Hao and Galley, Michel and Chang, Kai-Wei and Wu, Ying Nian and Zhu, Song-Chun and Gao, Jianfeng},
  title = {Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models},
  booktitle = {NeurIPS},
  keyword_extra = {AI-agent},
  year = {2023}
}

Related Publications

On the Paradox of Learning to Reason from Data, IJCAI, 2023
AVIS: Autonomous Visual Information Seeking with Large Language Models, NeurIPS, 2023
A Survey of Deep Learning for Mathematical Reasoning, ACL, 2023
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step, ACL, 2023
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning, ICLR, 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering, NeurIPS, 2022
Semantic Probabilistic Layers for Neuro-Symbolic Learning, NeurIPS, 2022
Neuro-Symbolic Entropy Regularization, UAI, 2022