Picture for Tri Dao

Tri Dao

Log-Linear Attention

Add code
Jun 05, 2025
Viaarxiv icon

Hardware-Efficient Attention for Fast Decoding

Add code
May 27, 2025
Viaarxiv icon

Long-Context State-Space Video World Models

Add code
May 26, 2025
Viaarxiv icon

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Add code
Apr 14, 2025
Viaarxiv icon

Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Add code
Feb 27, 2025
Viaarxiv icon

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Add code
Jan 11, 2025
Figure 1 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 2 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 3 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 4 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Viaarxiv icon

Marconi: Prefix Caching for the Era of Hybrid LLMs

Add code
Nov 28, 2024
Viaarxiv icon

RedPajama: an Open Dataset for Training Large Language Models

Add code
Nov 19, 2024
Viaarxiv icon

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Add code
Aug 27, 2024
Viaarxiv icon

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

Add code
Jul 13, 2024
Viaarxiv icon