Siyuan Qi

PhD Candidate @ UCLA CS

About Me

I am a second year Ph.D. Candidate in the Computer Science Department at University of California, Los Angeles. I am currently doing computer vision research in the Center for Vision, Cognition, Learning, and Autonomy advised by Professor Song-Chun Zhu.

My research interests include Computer Vision, Machine Learning, and Cognitive Science.

We who cut mere stones must always be envisioning cathedrals.

Quarry worker's creed


2017 Apr

VRLA Expo 2017

[Invited talk] I presented our work on "Examining Human Physical Judgments Across Virtual Gravity Fields" in VRLA 2017.

Invited Talk  Virtual Reality  Cognitive Science 

2016 Nov

IEEE Virtual Reality 2017

[Oral] The Martian: Examining Human Physical Judgments Across Virtual Gravity Fields.

Accepted to TVCG

[paper] [project] [demo]

Oral  Journal  Virtual Reality  Cognitive Science 


  • Human Activity Prediction

    This project aims to predict future human activities from partially observed RGB-D videos. Human activity prediction is generally difficult due to its non-Markovian property and the rich context between human and environments. We use a stochastic grammar model to capture the compositional/hierarchical structure of events, integrating human actions, objects, and their affordances.

    Computer Vision  Robotics 
  • Indoor Scene Synthesis by Stochastic Grammar

    This project studies how to realistically synthesis indoor scene layouts using stochastic grammar. We present a novel human-centric method to sample 3D room layouts and synthesis photo-realistic images using physics-based rendering. We use object affordance and human activity planning to model indoor scenes, which contains functional grouping relations and supporting relations between furniture and objects. An attributed spatial And-Or graph (S-AOG) is proposed to model indoor scenes. The S-AOG is a stochastic context sensitive grammar, in which the terminal nodes are object entities including room, furniture and supported objects.

    Computer Vision  Computer Graphics 


  • Configurable, Photorealistic Image Rendering and Ground Truth Synthesis by Sampling Stochastic Grammars Representing Indoor Scenes

    Chenfanfu Jiang*, Yixin Zhu*, Siyuan Qi*, Siyuan Huang*, Jenny Lin, Xingwen Guo, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu.
    * equal contributors


    Computer Vision  Computer Graphics 

  • Abstract
    We propose the configurable rendering of massive quantities of photorealistic images with ground truth for the purposes of training, benchmarking, and diagnosing computer vision models. In contrast to the conventional (crowd-sourced) manual labeling of ground truth for a relatively modest number of RGB-D images captured by Kinect-like sensors, we devise a non-trivial configurable pipeline of algorithms capable of generating a potentially infinite variety of indoor scenes using a stochastic grammar, specifically, one represented by an attributed spatial And-Or graph. We employ physics-based rendering to synthesize photorealistic RGB images while automatically synthesizing detailed, per-pixel ground truth data, including visible surface depth and normal, object identity and material information, as well as illumination. Our pipeline is configurable inasmuch as it enables the precise customization and control of important attributes of the generated scenes. We demonstrate that our generated scenes achieve a performance similar to the NYU v2 Dataset on pre-trained deep learning models. By modifying pipeline components in a controllable manner, we furthermore provide diagnostics on common scene understanding tasks; eg., depth and surface normal prediction, semantic segmentation, etc.
  • [Oral] The Martian: Examining Human Physical Judgments Across Virtual Gravity Fields.

    Tian Ye*, Siyuan Qi*, James Kubricht, Yixin Zhu, Hongjing Lu, Song-Chun Zhu.
    * equal contributors

    IEEE VR 2017, Los Angeles, California
    Accepted to TVCG

    Oral  Journal  Virtual Reality  Cognitive Science 

  • Abstract
    This paper examines how humans adapt to novel physical situations with unknown gravitational acceleration in immersive virtual environments. We designed four virtual reality experiments with different tasks for participants to complete: strike a ball to hit a target, trigger a ball to hit a target, predict the landing location of a projectile, and estimate the flight duration of a projectile. The first two experiments compared human behavior in the virtual environment with real-world performance reported in the literature. The last two experiments aimed to test the human ability to adapt to novel gravity fields by measuring their performance in trajectory prediction and time estimation tasks. The experiment results show that: 1) based on brief observation of a projectile's initial trajectory, humans are accurate at predicting the landing location even under novel gravity fields, and 2) humans' time estimation in a familiar earth environment fluctuates around the ground truth flight duration, although the time estimation in unknown gravity fields indicates a bias toward earth's gravity.


  • First Class Honors, Faculty of Engineering, University of Hong Kong
  • Undergraduate Research Fellowship, University of Hong Kong
  • Kingboard Scholarship, University of Hong Kong
    2010 & 2011 & 2012
  • Dean's Honors List, University of Hong Kong
    2010 & 2011
  • AI Challenge (Sponsored by Google), 2nd place in Chinese contestants, 74th worldwide
  • Student Ambassador, University of Hong Kong
  • University Entrance Scholarship, University of Hong Kong