Shan Yu | UCLA

About Me

I am a Ph.D. candidate in Computer Science at the UCLA Systems Lab, advised by Prof. Harry Xu.

My research lies in machine learning systems (MLSys), with a focus on building cost-efficient, low-latency systems for compound AI workloads, including large language model (LLM) serving (Prism, ConServe), multi-agent systems (Pythia), and video QA pipelines (VQPy).

Before joining UCLA, I was a Senior AI Frameworks Engineer at Intel, where I was a core developer of BigDL, an open-source framework for scalable big data analytics and AI, and shipped distributed AI pipelines deployed at production scale. I received my bachelor's and master's degrees from Zhejiang University.

Looking for interns! If you're interested in LLM infra and would like to help build the open-source kvcached project together, please send me an email and let's chat.

News

May 2026Prism selected as a research highlight by Amazon.
May 2026ConServe accepted to ICML 2026.
Apr 2026Prism accepted to OSDI 2026. See you in Seattle!
Apr 2026kvcached has been productionized by Red Hat for cost-efficient multi-LLM serving. Read the Red Hat blog.
Oct 2025Released kvcached with a technical deep-dive blog.
Sept 2025Received the 2025 Amazon AI Fellowship.
June 2025Started a research internship with the Alibaba Cloud team in Bellevue.

Open Source

kvcachedelastic KV cache sharing for multi-LLM serving

An open-source library that brings elastic GPU memory sharing to LLM serving, enabling cost-efficient colocation of multiple LLMs with drop-in support for SGLang and vLLM.

Productionized by Red Hat Co-led with Jiarong Xing and Yifan Qiao · Red Hat blog · Deep-dive blog

Publications

Pythia: Toward Predictability-Driven Agent-Native LLM Serving

Shan Yu, Junyi Shu, Yuanjiang Ni, Kun Qian, Xue Li, Yang Wang, Jinyuan Zhang, Ziyi Xu, Shuo Yang, Lingjun Zhu, Ennan Zhai, Qingda Lu, Jiarong Xing, Youyou Lu, Xin Jin, Xuanzhe Liu, Harry Xu

arXiv preprint, 2026

PDF
Prism: Cost-Efficient Multi-LLM Serving via GPU Memory Ballooning

Shan Yu, Yifan Qiao, Mingyuan Ma, Yangmin Li, Shuo Yang, Xinyuan Tong, Yang Wang, Zhiqiang Xie, Yuwei An, Shiyi Cao, Ke Bao, Deepak Vij, Xiaoning Ding, Yichen Wang, Qingda Lu, Zhong Wang, Gao Gao, Harry Xu, Junyi Shu, Jiarong Xing, Ying Sheng

20th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2026

PDF Code
ConServe: Fine-Grained GPU Harvesting for LLM Online and Offline Co-Serving

Yifan Qiao, Shan Yu, Shu Anzai, Haoran Ma, Shuo Yang, Yang Wang, Miryung Kim, Yongji Wu, Yang Zhou, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, Harry Xu

43rd International Conference on Machine Learning (ICML), 2026

PDF Code
VQPy: An Object-Oriented Approach to Modern Video Analytics

Shan Yu, Zhenting Zhu, Yu Chen, Hanchen Xu, Pengzhan Zhao, Yang Wang, Arthi Padmanabhan, Hugo Latapie, Harry Xu

Conference on Machine Learning and Systems (MLSys), 2024

PDF Code
DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, Harry Xu

USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024

PDF Code Talk
Ethosight: A Reasoning-Guided Iterative Learning System for Nuanced Perception based on Joint-Embedding and Contextual Label Affinity

Hugo Latapie, Shan Yu, Patrick Hammer, Kristinn R. Thorisson, Vahagn Petrosyan, Brandon Kynoch, Alind Khare, Payman Behnam, Alexey Tumanov, Aksheit Saxena, Anish Aralikatti, Hanning Chen, Mohsen Imani, Mike Archbold, Tangrui Li, Pei Wang, Justin Hart

arXiv preprint, 2023

PDF
BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Jason (Jinquan) Dai, Ding Ding, Dongjie Shi, Shengsheng Huang, Jiao Wang, Xin Qiu, Kai Huang, Guoqiong Song, Yang Wang, Qiyuan Gong, Jiaming Song, Shan Yu, Le Zheng, Yina Chen, Junwei Deng, Ge Song

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2022

PDF Code

Services

MLSys 2026, External Review Committee
ASPLOS 2026, External Review Committee
EuroSys 2026, Shadow Program Committee
OSDI 2024, Artifact Evaluation Committee
ATC 2024, Artifact Evaluation Committee

Talks

VQPy: An Object-Oriented Approach to Modern Video Analytics

Zhejiang University, Nanjing University, Chinese Academy of Sciences, 2023

What's On-going in the Spark + AI Community

InfoQ Live Big Shot Talks, 2021

Chronos: Scalable Time Series Analytics with AutoML

Intel IAGS China Technical Dev Series, 2021

Time Series Analytics using AutoML and Ray on Analytics Zoo

Intel Data Centric Conference and New Product Launch, 2020

Leveraging Distributed AutoML for Time Series Analytics

Alibaba EMR Apache Spark Chinese Technology Salon, Shanghai, 2019

Game Playing Using AI on Spark

O'Reilly Artificial Intelligence Conference, Beijing, 2019

Awards

Amazon AI PhD Fellowship2025
MLSys Student Grant2024
NSDI Diversity Grant2024
UCLA Summer Mentored Research Fellowship2023, 2024
Cisco You Amaze 2 Award2023
Intel Division Recognition Award2021, 2022
Graduate of Merit, Zhejiang University2017–2018
Honor for Graduate, Zhejiang University2016–2018
Academic Scholarship, Zhejiang University2015–2016