Picture for Juechu Dong

Juechu Dong

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

Add code
Dec 24, 2025
Viaarxiv icon

Flex Attention: A Programming Model for Generating Optimized Attention Kernels

Add code
Dec 07, 2024
Viaarxiv icon