Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Armin Gerami

FAST: Factorizable Attention for Speeding up Transformers

Feb 12, 2024

Armin Gerami, Monte Hoover, Pranav S. Dulepet, Ramani Duraiswami

Figure 1 for FAST: Factorizable Attention for Speeding up Transformers

Figure 2 for FAST: Factorizable Attention for Speeding up Transformers

Figure 3 for FAST: Factorizable Attention for Speeding up Transformers

Figure 4 for FAST: Factorizable Attention for Speeding up Transformers

Abstract:Motivated by the factorization inherent in the original fast multipole method and the improved fast Gauss transform we introduce a factorable form of attention that operates efficiently in high dimensions. This approach reduces the computational and memory complexity of the attention mechanism in transformers from $O(N^2)$ to $O(N)$. In comparison to previous attempts, our work presents a linearly scaled attention mechanism that maintains the full representation of the attention matrix without compromising on sparsification and incorporates the all-to-all relationship between tokens. We explore the properties of our new attention metric and conduct tests in various standard settings. Results indicate that our attention mechanism has a robust performance and holds significant promise for diverse applications where self-attention is used.

Via

Access Paper or Ask Questions