Picture for Aomufei Yuan

Aomufei Yuan

Beyond Parameter Arithmetic: Sparse Complementary Fusion for Distribution-Aware Model Merging

Add code
Feb 12, 2026
Viaarxiv icon

KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference

Add code
Apr 14, 2025
Figure 1 for KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
Figure 2 for KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
Figure 3 for KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
Figure 4 for KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
Viaarxiv icon

FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference

Add code
Feb 19, 2025
Figure 1 for FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
Figure 2 for FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
Figure 3 for FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
Figure 4 for FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
Viaarxiv icon

Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing

Add code
Apr 25, 2024
Figure 1 for Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing
Figure 2 for Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing
Figure 3 for Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing
Figure 4 for Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing
Viaarxiv icon