Picture for Luo Mai

Luo Mai

RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse

Add code
Nov 05, 2025
Viaarxiv icon

HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routing

Add code
May 18, 2025
Viaarxiv icon

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Add code
May 16, 2025
Viaarxiv icon

MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching

Add code
Mar 12, 2025
Viaarxiv icon

WaferLLM: A Wafer-Scale LLM Inference System

Add code
Feb 06, 2025
Viaarxiv icon

MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems

Add code
Dec 10, 2024
Viaarxiv icon

PH-Dropout: Practical Epistemic Uncertainty Quantification for View Synthesis

Add code
Oct 11, 2024
Viaarxiv icon

PH-Dropout: Prctical Epistemic Uncertainty Quantification for View Synthesis

Add code
Oct 07, 2024
Viaarxiv icon

Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding

Add code
Jul 12, 2024
Figure 1 for Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding
Figure 2 for Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding
Figure 3 for Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding
Figure 4 for Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding
Viaarxiv icon

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

Add code
Jan 25, 2024
Viaarxiv icon