avatar

Shan Yu

Ph.D. Candidate UCLA Systems Lab shanyu (at) cs.ucla.edu

About Me

I am a Ph.D. candidate in Computer Science at the UCLA Systems Lab, advised by Prof. Harry Xu.

My research lies in machine learning systems (MLSys), with a focus on building cost-efficient, low-latency systems for compound AI workloads, including large language model (LLM) serving (Prism, ConServe), multi-agent systems (Pythia), and video QA pipelines (VQPy).

Before joining UCLA, I was a Senior AI Frameworks Engineer at Intel, where I was a core developer of BigDL, an open-source framework for scalable big data analytics and AI, and shipped distributed AI pipelines deployed at production scale. I received my bachelor's and master's degrees from Zhejiang University.

Looking for interns! If you're interested in LLM infra and would like to help build the open-source kvcached project together, please send me an email and let's chat.

News

Open Source

kvcachedelastic KV cache sharing for multi-LLM serving

An open-source library that brings elastic GPU memory sharing to LLM serving, enabling cost-efficient colocation of multiple LLMs with drop-in support for SGLang and vLLM.

Productionized by Red Hat Co-led with Jiarong Xing and Yifan Qiao · Red Hat blog · Deep-dive blog

Publications

  1. Shan Yu, Junyi Shu, Yuanjiang Ni, Kun Qian, Xue Li, Yang Wang, Jinyuan Zhang, Ziyi Xu, Shuo Yang, Lingjun Zhu, Ennan Zhai, Qingda Lu, Jiarong Xing, Youyou Lu, Xin Jin, Xuanzhe Liu, Harry Xu
    arXiv preprint, 2026
  2. Shan Yu, Yifan Qiao, Mingyuan Ma, Yangmin Li, Shuo Yang, Xinyuan Tong, Yang Wang, Zhiqiang Xie, Yuwei An, Shiyi Cao, Ke Bao, Deepak Vij, Xiaoning Ding, Yichen Wang, Qingda Lu, Zhong Wang, Gao Gao, Harry Xu, Junyi Shu, Jiarong Xing, Ying Sheng
    20th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2026
  3. Yifan Qiao, Shan Yu, Shu Anzai, Haoran Ma, Shuo Yang, Yang Wang, Miryung Kim, Yongji Wu, Yang Zhou, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, Harry Xu
    43rd International Conference on Machine Learning (ICML), 2026
  4. Shan Yu, Zhenting Zhu, Yu Chen, Hanchen Xu, Pengzhan Zhao, Yang Wang, Arthi Padmanabhan, Hugo Latapie, Harry Xu
    Conference on Machine Learning and Systems (MLSys), 2024
  5. Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, Harry Xu
    USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024
  6. Hugo Latapie, Shan Yu, Patrick Hammer, Kristinn R. Thorisson, Vahagn Petrosyan, Brandon Kynoch, Alind Khare, Payman Behnam, Alexey Tumanov, Aksheit Saxena, Anish Aralikatti, Hanning Chen, Mohsen Imani, Mike Archbold, Tangrui Li, Pei Wang, Justin Hart
    arXiv preprint, 2023
  7. Jason (Jinquan) Dai, Ding Ding, Dongjie Shi, Shengsheng Huang, Jiao Wang, Xin Qiu, Kai Huang, Guoqiong Song, Yang Wang, Qiyuan Gong, Jiaming Song, Shan Yu, Le Zheng, Yina Chen, Junwei Deng, Ge Song
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2022

Services

Talks

VQPy: An Object-Oriented Approach to Modern Video Analytics

  • Zhejiang University, Nanjing University, Chinese Academy of Sciences, 2023

What's On-going in the Spark + AI Community

  • InfoQ Live Big Shot Talks, 2021

Chronos: Scalable Time Series Analytics with AutoML

  • Intel IAGS China Technical Dev Series, 2021

Time Series Analytics using AutoML and Ray on Analytics Zoo

  • Intel Data Centric Conference and New Product Launch, 2020

Leveraging Distributed AutoML for Time Series Analytics

  • Alibaba EMR Apache Spark Chinese Technology Salon, Shanghai, 2019

Game Playing Using AI on Spark

  • O'Reilly Artificial Intelligence Conference, Beijing, 2019

Awards