Picture for Zhao Song

Zhao Song

Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers

Add code
Dec 23, 2024
Viaarxiv icon

Grams: Gradient Descent with Adaptive Momentum Scaling

Add code
Dec 22, 2024
Figure 1 for Grams: Gradient Descent with Adaptive Momentum Scaling
Figure 2 for Grams: Gradient Descent with Adaptive Momentum Scaling
Figure 3 for Grams: Gradient Descent with Adaptive Momentum Scaling
Figure 4 for Grams: Gradient Descent with Adaptive Momentum Scaling
Viaarxiv icon

Numerical Pruning for Efficient Autoregressive Models

Add code
Dec 17, 2024
Figure 1 for Numerical Pruning for Efficient Autoregressive Models
Figure 2 for Numerical Pruning for Efficient Autoregressive Models
Figure 3 for Numerical Pruning for Efficient Autoregressive Models
Figure 4 for Numerical Pruning for Efficient Autoregressive Models
Viaarxiv icon

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Add code
Dec 17, 2024
Viaarxiv icon

The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity

Add code
Dec 09, 2024
Figure 1 for The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity
Figure 2 for The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity
Viaarxiv icon

On Socially Fair Low-Rank Approximation and Column Subset Selection

Add code
Dec 08, 2024
Figure 1 for On Socially Fair Low-Rank Approximation and Column Subset Selection
Figure 2 for On Socially Fair Low-Rank Approximation and Column Subset Selection
Viaarxiv icon

Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond

Add code
Dec 08, 2024
Viaarxiv icon

On the Expressive Power of Modern Hopfield Networks

Add code
Dec 07, 2024
Viaarxiv icon

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Add code
Nov 25, 2024
Viaarxiv icon

Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training

Add code
Nov 25, 2024
Figure 1 for Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training
Figure 2 for Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training
Figure 3 for Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training
Viaarxiv icon