Picture for Tri Dao

Tri Dao

Mamba-3: Improved Sequence Modeling using State Space Principles

Add code
Mar 16, 2026
Viaarxiv icon

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

Add code
Mar 15, 2026
Viaarxiv icon

AI+HW 2035: Shaping the Next Decade

Add code
Mar 05, 2026
Viaarxiv icon

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

Add code
Mar 05, 2026
Viaarxiv icon

Speculative Speculative Decoding

Add code
Mar 03, 2026
Viaarxiv icon

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System

Add code
Feb 06, 2026
Viaarxiv icon

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

Add code
Dec 16, 2025
Figure 1 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Figure 2 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Figure 3 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Figure 4 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Viaarxiv icon

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training

Add code
Nov 17, 2025
Viaarxiv icon

Log-Linear Attention

Add code
Jun 05, 2025
Figure 1 for Log-Linear Attention
Figure 2 for Log-Linear Attention
Figure 3 for Log-Linear Attention
Figure 4 for Log-Linear Attention
Viaarxiv icon

Hardware-Efficient Attention for Fast Decoding

Add code
May 27, 2025
Figure 1 for Hardware-Efficient Attention for Fast Decoding
Figure 2 for Hardware-Efficient Attention for Fast Decoding
Figure 3 for Hardware-Efficient Attention for Fast Decoding
Figure 4 for Hardware-Efficient Attention for Fast Decoding
Viaarxiv icon