Picture for Ding Tang

Ding Tang

ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference

Add code
Mar 28, 2026
Viaarxiv icon

Towards Efficient Pre-training: Exploring FP4 Precision in Large Language Models

Add code
Feb 17, 2025
Viaarxiv icon