Picture for Simon Peter

Simon Peter

Reducing the GPU Memory Bottleneck with Lossless Compression for ML -- Extended

Add code
May 29, 2026
Viaarxiv icon

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

Add code
May 07, 2026
Viaarxiv icon

POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference

Add code
Oct 23, 2024
Viaarxiv icon