Picture for Zhao Song

Zhao Song

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Add code
Oct 26, 2023
Viaarxiv icon

Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

Add code
Oct 19, 2023
Figure 1 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Figure 2 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Figure 3 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Figure 4 for Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
Viaarxiv icon

Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention

Add code
Oct 18, 2023
Figure 1 for Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Figure 2 for Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Figure 3 for Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Figure 4 for Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Viaarxiv icon

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

Add code
Oct 17, 2023
Viaarxiv icon

How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

Add code
Oct 06, 2023
Viaarxiv icon

Fine-tune Language Models to Approximate Unbiased In-context Learning

Add code
Oct 05, 2023
Viaarxiv icon

A Unified Scheme of ResNet and Softmax

Add code
Sep 23, 2023
Figure 1 for A Unified Scheme of ResNet and Softmax
Figure 2 for A Unified Scheme of ResNet and Softmax
Figure 3 for A Unified Scheme of ResNet and Softmax
Figure 4 for A Unified Scheme of ResNet and Softmax
Viaarxiv icon

Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?

Add code
Sep 14, 2023
Figure 1 for Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?
Viaarxiv icon

A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

Add code
Sep 14, 2023
Figure 1 for A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Figure 2 for A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Figure 3 for A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Figure 4 for A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Viaarxiv icon

Online Adaptive Mahalanobis Distance Estimation

Add code
Sep 02, 2023
Figure 1 for Online Adaptive Mahalanobis Distance Estimation
Figure 2 for Online Adaptive Mahalanobis Distance Estimation
Figure 3 for Online Adaptive Mahalanobis Distance Estimation
Figure 4 for Online Adaptive Mahalanobis Distance Estimation
Viaarxiv icon