Picture for Rya Sanovar

Rya Sanovar

Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Add code
May 17, 2024
Figure 1 for Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Figure 2 for Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Figure 3 for Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Figure 4 for Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Viaarxiv icon