Picture for Fan Lai

Fan Lai

AutoScout: Structured Optimization for Automating ML System Configuration

Add code
Mar 12, 2026
Viaarxiv icon

SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving

Add code
Mar 09, 2026
Viaarxiv icon

tLoRA: Efficient Multi-LoRA Training with Elastic Shared Super-Models

Add code
Feb 06, 2026
Viaarxiv icon

PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference

Add code
Feb 03, 2026
Viaarxiv icon

Dora: QoE-Aware Hybrid Parallelism for Distributed Edge AI

Add code
Dec 09, 2025
Figure 1 for Dora: QoE-Aware Hybrid Parallelism for Distributed Edge AI
Figure 2 for Dora: QoE-Aware Hybrid Parallelism for Distributed Edge AI
Figure 3 for Dora: QoE-Aware Hybrid Parallelism for Distributed Edge AI
Figure 4 for Dora: QoE-Aware Hybrid Parallelism for Distributed Edge AI
Viaarxiv icon

Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models

Add code
Jun 11, 2025
Viaarxiv icon

Single-agent or Multi-agent Systems? Why Not Both?

Add code
May 23, 2025
Viaarxiv icon

Tempo: Application-aware LLM Serving with Mixed SLO Requirements

Add code
Apr 24, 2025
Figure 1 for Tempo: Application-aware LLM Serving with Mixed SLO Requirements
Figure 2 for Tempo: Application-aware LLM Serving with Mixed SLO Requirements
Figure 3 for Tempo: Application-aware LLM Serving with Mixed SLO Requirements
Figure 4 for Tempo: Application-aware LLM Serving with Mixed SLO Requirements
Viaarxiv icon

Circinus: Efficient Query Planner for Compound ML Serving

Add code
Apr 23, 2025
Figure 1 for Circinus: Efficient Query Planner for Compound ML Serving
Figure 2 for Circinus: Efficient Query Planner for Compound ML Serving
Figure 3 for Circinus: Efficient Query Planner for Compound ML Serving
Figure 4 for Circinus: Efficient Query Planner for Compound ML Serving
Viaarxiv icon

DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services

Add code
Feb 17, 2025
Viaarxiv icon