Picture for Zhao Song

Zhao Song

Unlocking the Theory Behind Scaling 1-Bit Neural Networks

Add code
Nov 03, 2024
Figure 1 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Figure 2 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Figure 3 for Unlocking the Theory Behind Scaling 1-Bit Neural Networks
Viaarxiv icon

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Add code
Oct 15, 2024
Figure 1 for Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Figure 2 for Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Figure 3 for Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Figure 4 for Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Viaarxiv icon

Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent

Add code
Oct 15, 2024
Figure 1 for Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Viaarxiv icon

Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study

Add code
Oct 15, 2024
Figure 1 for Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Figure 2 for Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Figure 3 for Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Viaarxiv icon

HSR-Enhanced Sparse Attention Acceleration

Add code
Oct 14, 2024
Figure 1 for HSR-Enhanced Sparse Attention Acceleration
Figure 2 for HSR-Enhanced Sparse Attention Acceleration
Viaarxiv icon

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

Add code
Oct 12, 2024
Viaarxiv icon

Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Add code
Oct 12, 2024
Figure 1 for Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
Figure 2 for Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
Figure 3 for Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
Figure 4 for Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
Viaarxiv icon

Log-concave Sampling over a Convex Body with a Barrier: a Robust and Unified Dikin Walk

Add code
Oct 08, 2024
Figure 1 for Log-concave Sampling over a Convex Body with a Barrier: a Robust and Unified Dikin Walk
Figure 2 for Log-concave Sampling over a Convex Body with a Barrier: a Robust and Unified Dikin Walk
Viaarxiv icon

Differentially Private Kernel Density Estimation

Add code
Sep 03, 2024
Viaarxiv icon

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Add code
Aug 23, 2024
Viaarxiv icon