Picture for Tri Dao

Tri Dao

SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving

Add code
Apr 21, 2026
Viaarxiv icon

Introspective Diffusion Language Models

Add code
Apr 13, 2026
Viaarxiv icon

Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution

Add code
Apr 09, 2026
Viaarxiv icon

Mamba-3: Improved Sequence Modeling using State Space Principles

Add code
Mar 16, 2026
Viaarxiv icon

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

Add code
Mar 15, 2026
Viaarxiv icon

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

Add code
Mar 05, 2026
Viaarxiv icon

AI+HW 2035: Shaping the Next Decade

Add code
Mar 05, 2026
Viaarxiv icon

Speculative Speculative Decoding

Add code
Mar 03, 2026
Viaarxiv icon

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System

Add code
Feb 06, 2026
Viaarxiv icon

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

Add code
Dec 16, 2025
Figure 1 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Figure 2 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Figure 3 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Figure 4 for SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Viaarxiv icon