Welcome to my academic homepage. I am a Ph.D. candidate in Computer Science at University of California, Los Angeles (UCLA). I went to Tsinghua University for college, also in CS.

I work on multimodal representation learning for visual reasoning and skill learning tasks. In particular, I'm interested in building and understanding inductive biases for learning representations from multi-modal data, so as to zero/few-shot (and systematically) generalize in real-world. Some of my research keywords can be found below:

  • Multimodal learning: Large models, Vision and language, Visual reasoning, 3D vision
  • Representation learning: Zero-shot and few-shot learning, Energy-based generative model
  • Robot learning: (Inverse) Reinforcement learning and imitation, Robotics, Sensory fusion
I'm on the job market! Looking for full-time position starting from summer 2023.

Email: xm [at] cs [dot] ucla [dot] edu / Google Scholar

News


Selected Publications


Preprint

Zihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, Yitao Liang
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
arXiv preprint / arXiv / Project / Code 
First ever agent with hierarchical goal execution that reaches the diamond (and accomplishes 70+ tasks!)

Jiangyong Huang, William Yicheng Zhu, Baoxiong Jia, Zan Wang, Xiaojian Ma, Qing Li, Siyuan Huang
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
arXiv preprint / Paper / arXiv / Project / Code / Benchmark 

Sirui Xie, Xiaojian Ma, Peiyu Yu, Yixin Zhu, Ying Nian Wu and Song-Chun Zhu
HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving
arXiv preprint / Paper / Click Here to Play HALMA! / arXiv 


Conference

Shaofei Cai, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
CVPR 2023 / arXiv / Project / Code 

Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, Siyuan Huang
SQA3D: Situated Question Answering in 3D Scenes
ICLR 2023 / Paper / arXiv / Slides / Project / Code / Benchmark 

Peyi Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruiqi Gao, Yixin Zhu, Song-Chun Zhu and Ying Nian Wu
Latent Diffusion Energy-Based Model for Interpretable Text Modeling
ICML 2022 / Paper / arXiv / Code 

Huaizu Jiang*, Xiaojian Ma*, Weili Nie, Zhiding Yu, Yuke Zhu, Anima Anandkumar
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
CVPR 2022 / Paper / Poster / Slides / Project / arXiv / Code / Bibtex 
Oral presentation

Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar
RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
ICLR 2022 / Paper / Poster / Slides / Project / OpenReview / arXiv / Code / Bibtex 

Peyi Yu, Sirui Xie, Xiaojian Ma, Yixin Zhu, Ying Nian Wu and Song-Chun Zhu
Unsupervised Foreground Extraction via Deep Region Competition
NeurIPS 2021 / Paper / arXiv / Code 

Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan and Lei Li
Adversarial Option-Aware Hierarchical Imitation Learning
ICML 2021 / Paper / arXiv / Code 
Spotlight presentation

Hongzhuo Liang, Chuangchuang Zhou, Shuang Li, Xiaojian Ma, Norman Hendrich, Timo Gerkmann, Fuchun Sun and Jianwei Zhang
Robust Robotic Pouring using Audition and Haptics
IROS 2020 / Paper / Project Page / arXiv / Code / Video 
Oral presentation

Shuang Li, Jiaxi Jiang, Philipp Ruppel, Hongzhuo Liang, Xiaojian Ma, Norman Hendrich, Fuchun Sun and Jianwei Zhang
A Mobile Robot Hand-Arm Teleoperation System by Vision and IMU
IROS 2020 / Paper / Project Page / arXiv / Code / Video 
Oral presentation

Mark Edmonds, Xiaojian Ma, Siyuan Qi, Yixin Zhu, Hongjing Lu and Song-Chun Zhu
Theory-based Causal Transfer: Integrating Instance-level Induction and Abstract-level Structure Learning
AAAI 2020 / Paper / Project Page / arXiv / Code 
Oral presentation

Xiaojian Ma*, Mingxuan Jing*, Wenbing Huang, Fuchun Sun, Bin Fang and Huaping Liu
Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance
AAAI 2020 / Paper / Project Page / arXiv / Code 
also in Structure & Priors in Reinforcement Learning Workshop @ ICLR 2019

Xiaojian Ma*, Chao Yang*, Wenbing Huang*, Fuchun Sun, Huaping Liu, Junzhou Huang and Chuang Gan,
Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement
NeurIPS 2019 / Paper / Project Page / arXiv / Code 
Spotlight presentation

Hongzhuo Liang, Shuang Li, Xiaojian Ma, Norman Hendrich, Timo Gerkmann, Fuchun Sun and Jianwei Zhang
Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring
IROS 2019 / Paper / Project Page / arXiv / Code / Video 

Xiaojian Ma*, Hongzhuo Liang*, Shuang Li, Michael Görner, Song Tang, Bin Fang, Fuchun Sun and Jianwei Zhang
PointNetGPD: Detecting Grasp Configurations from Point Sets
ICRA 2019 / Paper / Project Page / arXiv / Code / Video 

Xiaojian Ma*, Shuang Li*, Hongzhuo Liang, Michael Görner, Philipp Ruppel, Bin Fang, Fuchun Sun and Jianwei Zhang
Vision-based Teleoperation of Shadow Dexterous Hand using End-to-End Deep Neural Network
ICRA 2019 / Paper / Project Page / arXiv / Code / Video 

Xiaojian Ma*, Mingxuan Jing*, Wenbing Huang, Fuchun Sun and Huaping Liu
Task Transfer by Preference-Based Cost Learning
AAAI 2019 / Paper / Project Page / arXiv / Code 
Spotlight presentation

Experience


Professional Service



Contact


UCLA Center for Vision, Cognition, Learning and Autonomy
Slitcher Hall
603 Charles E Young Dr E #3878
Los Angeles, CA 90024
xm [at] cs [dot] ucla [dot] edu
[Google Scholar]  |  [GitHub]  


© Xiaojian Ma 2022