Picture for Minyi Guo

Minyi Guo

Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers

Add code
Jun 06, 2025
Figure 1 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 2 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 3 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 4 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Viaarxiv icon

An Efficient Private GPT Never Autoregressively Decodes

Add code
May 21, 2025
Viaarxiv icon

LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval

Add code
May 21, 2025
Figure 1 for LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval
Figure 2 for LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval
Figure 3 for LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval
Figure 4 for LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval
Viaarxiv icon

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism

Add code
May 20, 2025
Viaarxiv icon

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

Add code
May 19, 2025
Viaarxiv icon

Saga: Capturing Multi-granularity Semantics from Massive Unlabelled IMU Data for User Perception

Add code
Apr 16, 2025
Viaarxiv icon

Prism: Mining Task-aware Domains in Non-i.i.d. IMU Data for Flexible User Perception

Add code
Jan 03, 2025
Figure 1 for Prism: Mining Task-aware Domains in Non-i.i.d. IMU Data for Flexible User Perception
Figure 2 for Prism: Mining Task-aware Domains in Non-i.i.d. IMU Data for Flexible User Perception
Figure 3 for Prism: Mining Task-aware Domains in Non-i.i.d. IMU Data for Flexible User Perception
Figure 4 for Prism: Mining Task-aware Domains in Non-i.i.d. IMU Data for Flexible User Perception
Viaarxiv icon

A Survey on Inference Optimization Techniques for Mixture of Experts Models

Add code
Dec 18, 2024
Figure 1 for A Survey on Inference Optimization Techniques for Mixture of Experts Models
Figure 2 for A Survey on Inference Optimization Techniques for Mixture of Experts Models
Figure 3 for A Survey on Inference Optimization Techniques for Mixture of Experts Models
Figure 4 for A Survey on Inference Optimization Techniques for Mixture of Experts Models
Viaarxiv icon

ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression

Add code
Dec 04, 2024
Figure 1 for ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
Figure 2 for ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
Figure 3 for ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
Figure 4 for ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
Viaarxiv icon

Nimbus: Secure and Efficient Two-Party Inference for Transformers

Add code
Nov 24, 2024
Figure 1 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 2 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 3 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 4 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Viaarxiv icon