Picture for Josh Alman

Josh Alman

Every Bit Counts: A Theoretical Study of Precision-Expressivity Tradeoffs in Quantized Transformers

Add code
Feb 02, 2026
Viaarxiv icon

Poly-attention: a general scheme for higher-order self-attention

Add code
Feb 02, 2026
Viaarxiv icon

Two heads are better than one: simulating large transformers with small ones

Add code
Jun 13, 2025
Viaarxiv icon

Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse

Add code
May 22, 2025
Viaarxiv icon

Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform

Add code
May 17, 2025
Viaarxiv icon

Fundamental Limitations on Subquadratic Alternatives to Transformers

Add code
Oct 05, 2024
Viaarxiv icon

The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

Add code
Feb 07, 2024
Viaarxiv icon

How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

Add code
Oct 06, 2023
Viaarxiv icon

Fast Attention Requires Bounded Entries

Add code
Feb 26, 2023
Viaarxiv icon

Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing

Add code
Nov 25, 2022
Viaarxiv icon