Picture for Xuanzhe Liu

Xuanzhe Liu

HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

Add code
Mar 18, 2026
Viaarxiv icon

ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning

Add code
Mar 13, 2026
Viaarxiv icon

DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

Add code
Mar 09, 2026
Viaarxiv icon

WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

Add code
Dec 10, 2025
Figure 1 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving
Figure 2 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving
Figure 3 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving
Figure 4 for WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving
Viaarxiv icon

TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving

Add code
Aug 24, 2025
Figure 1 for TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Figure 2 for TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Figure 3 for TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Figure 4 for TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Viaarxiv icon

FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation

Add code
May 30, 2025
Figure 1 for FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Figure 2 for FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Figure 3 for FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Figure 4 for FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Viaarxiv icon

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

Add code
May 19, 2025
Viaarxiv icon

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

Add code
Apr 03, 2025
Figure 1 for MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
Figure 2 for MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
Figure 3 for MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
Figure 4 for MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
Viaarxiv icon

Benchmarking Bias in Large Language Models during Role-Playing

Add code
Nov 01, 2024
Figure 1 for Benchmarking Bias in Large Language Models during Role-Playing
Figure 2 for Benchmarking Bias in Large Language Models during Role-Playing
Figure 3 for Benchmarking Bias in Large Language Models during Role-Playing
Figure 4 for Benchmarking Bias in Large Language Models during Role-Playing
Viaarxiv icon

Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU

Add code
Jul 08, 2024
Figure 1 for Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU
Figure 2 for Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU
Figure 3 for Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU
Figure 4 for Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU
Viaarxiv icon