Picture for Yiwu Yao

Yiwu Yao

RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention

Add code
Dec 30, 2025
Viaarxiv icon

TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space

Add code
Nov 15, 2025
Figure 1 for TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
Figure 2 for TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
Figure 3 for TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
Figure 4 for TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
Viaarxiv icon

DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization

Add code
Nov 06, 2025
Viaarxiv icon

Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers

Add code
Jun 06, 2025
Figure 1 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 2 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 3 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Figure 4 for Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Viaarxiv icon

RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy

Add code
May 27, 2025
Viaarxiv icon

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

Add code
May 19, 2025
Viaarxiv icon

Dynamic Low-Rank Sparse Adaptation for Large Language Models

Add code
Feb 20, 2025
Figure 1 for Dynamic Low-Rank Sparse Adaptation for Large Language Models
Figure 2 for Dynamic Low-Rank Sparse Adaptation for Large Language Models
Figure 3 for Dynamic Low-Rank Sparse Adaptation for Large Language Models
Figure 4 for Dynamic Low-Rank Sparse Adaptation for Large Language Models
Viaarxiv icon

KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference

Add code
Feb 06, 2025
Figure 1 for KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Figure 2 for KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Figure 3 for KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Figure 4 for KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Viaarxiv icon

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads

Add code
Jul 22, 2024
Viaarxiv icon

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

Add code
Oct 17, 2023
Figure 1 for Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Figure 2 for Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Figure 3 for Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Figure 4 for Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Viaarxiv icon