Picture for Minyi Guo

Minyi Guo

Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding

Add code
Dec 29, 2025
Viaarxiv icon

Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms

Add code
Nov 19, 2025
Viaarxiv icon

MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts

Add code
Nov 18, 2025
Viaarxiv icon

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution

Add code
Sep 11, 2025
Viaarxiv icon

ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive

Add code
Aug 26, 2025
Figure 1 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Figure 2 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Figure 3 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Figure 4 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Viaarxiv icon

Adacc: Adaptive Compression and Activation Checkpointing for LLM Memory Management

Add code
Aug 01, 2025
Viaarxiv icon

Efficient Serving of LLM Applications with Probabilistic Demand Modeling

Add code
Jun 17, 2025
Viaarxiv icon

Efficient Unified Caching for Accelerating Heterogeneous AI Workloads

Add code
Jun 14, 2025
Viaarxiv icon

STREAMINGGS: Voxel-Based Streaming 3D Gaussian Splatting with Memory Optimization and Architectural Support

Add code
Jun 09, 2025
Figure 1 for STREAMINGGS: Voxel-Based Streaming 3D Gaussian Splatting with Memory Optimization and Architectural Support
Figure 2 for STREAMINGGS: Voxel-Based Streaming 3D Gaussian Splatting with Memory Optimization and Architectural Support
Figure 3 for STREAMINGGS: Voxel-Based Streaming 3D Gaussian Splatting with Memory Optimization and Architectural Support
Figure 4 for STREAMINGGS: Voxel-Based Streaming 3D Gaussian Splatting with Memory Optimization and Architectural Support
Viaarxiv icon

Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers

Add code
Jun 06, 2025
Figure 1 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 2 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 3 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 4 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Viaarxiv icon