• REP is accepted to appear in NeurIPS 2025!

    "We introduce resource-efficient prompting (REP), which improves the computational and memory efficiency of prompt-based rehearsal-free methods while minimizing accuracy trade-offs."

  • CCL is accepted to appear in EuroSys 2026!

    "a novel Carbon-footprint-aware Continuous Learning (CCL) scheme that minimizes carbon emissions during model retraining without sacrificing inference accuracy."

  • Garen is accepted to appear in EuroSys 2026!

    "a system implementing a concept called atomic state reconciliation (ASR), which ensures atomicity and consistency of reconciliation to protect the cluster against state inconsistencies."

  • Taeyoon is awarded Best Poster by KIISE for our work FusionFlow!

  • Xinyue Ma is accepted to the intern program at Microsoft Research Redmond!

    She will work under the RiSE group in MSR Redmond for a 3-month internship program from September.

  • “FusionFlow" is accepted to appear in VLDB 2024!

    "orchestrating data preprocessing tasks across CPUs and GPUs while minimizing interference with GPU-based model training"

Our Research

Our research goal is to advance the state of the art in emerging large-scale system platforms by making them more efficient, responsive, intelligent, and programmable. Our current research topics lie in the following areas, and continues to evolve as new challenges emerge:

Efficient LLM Serving Systems: We design inference and serving systems that meet strict latency and cost targets at production scale.

Continual and On-Device Learning: We build methods and system support for continual adaptation on resource-constrained devices and edge environments.

Large-Scale Distributed Training: We develop techniques to train models efficiently across heterogeneous and geo-distributed clusters.

Fast and Scalable Big Data Analytics: We build data systems that accelerate iterative analytics and scalable ML data pipelines.

READ MORE

Selected Publications

ORBITFLOW: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration

Xinyue Ma*, Heelim Hong*, Jongseop Lee, Seoyeong Choy, Woo-Yeon Lee, Taegeon Um, Myeongjae Jeon

REP: Resource-Efficient Prompting for Rehearsal-Free Continual Learning

Sungho Jeon, Xinyue Ma, Kwang In Kim, Myeongjae Jeon

FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation

Taeyoon Kim, Chanho Park, Mansur Mukimbekov, Heelim Hong, Minseok Kim, Ze Jin, Changdae Kim, Ji-Yong Shin, Myeongjae Jeon

Cost-effective On-device Continual Learning over Memory Hierarchy with Miro

Xinyue Ma, Suyeon Jeong, Minjia Zhang, Di Wang, Jonghyun Choi, Myeongjae Jeon

CarM: Hierarchical Episodic Memory for Continual Learning

Soobee Lee, Minindu Weerakoon, Jonghyun Choi, Minjia Zhang, Di Wang, Myeongjae Jeon

Jarvis: Large-scale Server Monitoring with Adaptive Near-data Processing

Atul Sandur, ChanHo Park, Stavros Volos, Gul Agha, Myeongjae Jeon

Zico: Efficient GPU Memory Sharing for Concurrent DNN Training

Gangmuk Lim, Jeongseob Ahn, Wencong Xiao, Youngjin Kwon, Myeongjae Jeon

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads

Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang