Picture for Xuanzhe Liu

Xuanzhe Liu

WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

Add code
Dec 10, 2025
Figure 1 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving
Figure 2 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving
Figure 3 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving
Figure 4 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving
Viaarxiv icon

TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving

Add code
Aug 24, 2025
Figure 1 for TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Figure 2 for TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Figure 3 for TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Figure 4 for TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Viaarxiv icon

FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation

Add code
May 30, 2025
Figure 1 for FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Figure 2 for FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Figure 3 for FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Figure 4 for FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Viaarxiv icon

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

Add code
May 19, 2025
Viaarxiv icon

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

Add code
Apr 03, 2025
Figure 1 for MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
Figure 2 for MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
Figure 3 for MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
Figure 4 for MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
Viaarxiv icon

Benchmarking Bias in Large Language Models during Role-Playing

Add code
Nov 01, 2024
Figure 1 for Benchmarking Bias in Large Language Models during Role-Playing
Figure 2 for Benchmarking Bias in Large Language Models during Role-Playing
Figure 3 for Benchmarking Bias in Large Language Models during Role-Playing
Figure 4 for Benchmarking Bias in Large Language Models during Role-Playing
Viaarxiv icon

Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU

Add code
Jul 08, 2024
Figure 1 for Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU
Figure 2 for Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU
Figure 3 for Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU
Figure 4 for Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU
Viaarxiv icon

RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

Add code
Apr 18, 2024
Figure 1 for RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Figure 2 for RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Figure 3 for RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Figure 4 for RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Viaarxiv icon

LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

Add code
Apr 15, 2024
Viaarxiv icon

Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance

Add code
Feb 08, 2024
Figure 1 for Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance
Figure 2 for Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance
Figure 3 for Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance
Figure 4 for Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance
Viaarxiv icon