Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence
Yining Hong, Rui Sun, Bingxuan Li, Xingcheng Yao, Maxine Wu, Alexander Chien, Da Yin, Ying Nian Wu, Zhecan James Wang, and Kai-Wei Chang, in NeurIPS, 2025.
Spotlight (top 5% papers)
CodeDownload the full text
Abstract
AI agents today are mostly siloed ¡X they either retrieve and reason over vast amounts of digital information or interact with the physical world through embodied perception, planning and action, but rarely both. This separation limits their ability to solve tasks requiring integrated physical and digital intelligence, such as cooking from online recipes, navigating with dynamic map data, or interpreting real-world landmarks using web knowledge. This paper introduces Embodied Web Agents, a paradigm for AI agents that fluidly bridge embodiment and web-scale reasoning. The authors develop Embodied Web Agents task environments, a unified simulation platform integrating realistic 3D indoor and outdoor environments with functional web interfaces. Building on this platform, they construct and release the Embodied Web Agents Benchmark, which encompasses cooking, navigation, shopping, tourism, and geolocation tasks requiring coordinated reasoning across physical and digital realms. Experiments reveal significant performance gaps between state-of-the-art AI systems and human capabilities, highlighting challenges and opportunities at the intersection of embodied cognition and web-scale knowledge.
Bib Entry
@inproceedings{hong2025embodied, title = {Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence}, author = {Hong, Yining and Sun, Rui and Li, Bingxuan and Yao, Xingcheng and Wu, Maxine and Chien, Alexander and Yin, Da and Wu, Ying Nian and Wang, Zhecan James and Chang, Kai-Wei}, booktitle = {NeurIPS}, year = {2025} }