Picture for Zhao Song

Zhao Song

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Add code
Nov 25, 2024
Viaarxiv icon

Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training

Add code
Nov 25, 2024
Figure 1 for Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training
Figure 2 for Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training
Figure 3 for Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training
Viaarxiv icon

Circuit Complexity Bounds for RoPE-based Transformer Architecture

Add code
Nov 12, 2024
Viaarxiv icon

On Differentially Private String Distances

Add code
Nov 08, 2024
Viaarxiv icon

Unlocking the Theory Behind Scaling 1-Bit Neural Networks

Add code
Nov 03, 2024
Figure 1 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Figure 2 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Figure 3 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Viaarxiv icon

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Add code
Oct 15, 2024
Viaarxiv icon

Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study

Add code
Oct 15, 2024
Figure 1 for Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Figure 2 for Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Figure 3 for Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Viaarxiv icon

Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent

Add code
Oct 15, 2024
Figure 1 for Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Viaarxiv icon

HSR-Enhanced Sparse Attention Acceleration

Add code
Oct 14, 2024
Figure 1 for HSR-Enhanced Sparse Attention Acceleration
Figure 2 for HSR-Enhanced Sparse Attention Acceleration
Viaarxiv icon

Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Add code
Oct 12, 2024
Viaarxiv icon