Picture for Zhao Song

Zhao Song

Fast Gradient Computation for RoPE Attention in Almost Linear Time

Add code
Dec 23, 2024
Viaarxiv icon

Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers

Add code
Dec 23, 2024
Viaarxiv icon

Grams: Gradient Descent with Adaptive Momentum Scaling

Add code
Dec 22, 2024
Figure 1 for Grams: Gradient Descent with Adaptive Momentum Scaling
Figure 2 for Grams: Gradient Descent with Adaptive Momentum Scaling
Figure 3 for Grams: Gradient Descent with Adaptive Momentum Scaling
Figure 4 for Grams: Gradient Descent with Adaptive Momentum Scaling
Viaarxiv icon

Numerical Pruning for Efficient Autoregressive Models

Add code
Dec 17, 2024
Figure 1 for Numerical Pruning for Efficient Autoregressive Models
Figure 2 for Numerical Pruning for Efficient Autoregressive Models
Figure 3 for Numerical Pruning for Efficient Autoregressive Models
Figure 4 for Numerical Pruning for Efficient Autoregressive Models
Viaarxiv icon

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Add code
Dec 17, 2024
Viaarxiv icon

The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity

Add code
Dec 09, 2024
Figure 1 for The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity
Figure 2 for The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity
Viaarxiv icon

Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond

Add code
Dec 08, 2024
Viaarxiv icon

On Socially Fair Low-Rank Approximation and Column Subset Selection

Add code
Dec 08, 2024
Figure 1 for On Socially Fair Low-Rank Approximation and Column Subset Selection
Figure 2 for On Socially Fair Low-Rank Approximation and Column Subset Selection
Viaarxiv icon

On the Expressive Power of Modern Hopfield Networks

Add code
Dec 07, 2024
Viaarxiv icon

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Add code
Nov 25, 2024
Viaarxiv icon