Shan Yu | UCLA

About Me

I am a third-year Ph.D. student in Computer Science at UCLA, where I am fortunate to be advised by Prof. Harry Xu.

My research focuses on machine learning systems, where I design and build high-performance, cost-effective solutions by leveraging application semantics. My current work includes developing scalable and efficient systems for large language model (LLM) serving (Prism, ConServe) and video querying (VQPy).

Before my PhD, I spent four years at Intel as an AI frameworks engineer, building open-source big data and AI systems optimized for Intel CPUs in data centers.

Publications

Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving

Shan Yu, Jiarong Xing, Yifan Qiao, Mingyuan Ma, Yangmin Li, Yang Wang, Shuo Yang, Zhiqiang Xie, Shiyi Cao, Ke Bao, Ion Stoica, Harry Xu, Ying Sheng

arXiv preprint, 2025

PDF Code
ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving

Yifan Qiao, Shu Anzai, Shan Yu, Haoran Ma, Yang Wang, Miryung Kim, Harry Xu

arXiv preprint, 2024

PDF Code
VQPy: An Object-Oriented Approach to Modern Video Analytics

Shan Yu, Zhenting Zhu, Yu Chen, Hanchen Xu, Pengzhan Zhao, Yang Wang, Arthi Padmanabhan, Hugo Latapie, Harry Xu

Conference on Machine Learning and Systems (MLSys), 2024.

PDF Code
DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, Harry Xu

USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024.

PDF Code Talk
Ethosight: A Reasoning-Guided Iterative Learning System for Nuanced Perception based on Joint-Embedding and Contextual Label Affinity

Hugo Latapie, Shan Yu, Patrick Hammer, Kristinn R. Thorisson, Vahagn Petrosyan, Brandon Kynoch, Alind Khare, Payman Behnam, Alexey Tumanov, Aksheit Saxena, Anish Aralikatti, Hanning Chen, Mohsen Imani, Mike Archbold, Tangrui Li, Pei Wang, Justin Hart

arXiv preprint, 2023

PDF
BigDL 2.0: Seamless scaling of ai pipelines from laptops to distributed cluster

Jason (Jinquan) Dai, Ding Ding, Dongjie Shi, Shengsheng Huang, Jiao Wang, Xin Qiu, Kai Huang, Guoqiong Song, Yang Wang, Qiyuan Gong, Jiaming Song, Shan Yu, Le Zheng, Yina Chen, Junwei Deng, Ge Song

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2022.

PDF Code

Services

Artifact Evaluation Committee

Talks

VQPy: An Object-Oriented Approach to Modern Video Analytics

Zhejiang University, Nanjing University, Chinese Academy of Science, 2023

What's on-going in Spark + AI Community

InfoQ Live Big Shot talks, 2021

Chronos: Scalable Time Series Analytics with AutoML

Intel IAGS China Technical Dev Series, 2021

Time Series Analytics using AutoML and Ray on Analytics Zoo

Intel Data Centric Conference and New Product Launch, 2020

Leveraging Distributed AutoML for Time Series Analytics

Alibaba EMR Apache Spark Chinese Technology Salon, Shanghai, 2019

Game Playing Using AI on Spark

O'Reilly Artificial Intelligence Conference, Beijing, 2019

Awards

MLSys Student Grant 2024
NSDI Diversity Grant 2024
UCLA Summer Mentored Research Fellowship 2023, 2024
Cisco You Amaze 2 Award 2023
Intel Division Recognition Award 2021, 2022
Graduate of Merit, Zhejiang University 2017-2018
Honor for Graduate, Zhejiang University 2016-2018
Academic Scholarship, Zhejiang University 2015-2016