Picture for Zhao Song

Zhao Song

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

Add code
Oct 17, 2023
Viaarxiv icon

How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

Add code
Oct 06, 2023
Viaarxiv icon

Fine-tune Language Models to Approximate Unbiased In-context Learning

Add code
Oct 05, 2023
Viaarxiv icon

A Unified Scheme of ResNet and Softmax

Add code
Sep 23, 2023
Figure 1 for A Unified Scheme of ResNet and Softmax
Figure 2 for A Unified Scheme of ResNet and Softmax
Figure 3 for A Unified Scheme of ResNet and Softmax
Figure 4 for A Unified Scheme of ResNet and Softmax
Viaarxiv icon

A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

Add code
Sep 14, 2023
Figure 1 for A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Figure 2 for A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Figure 3 for A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Figure 4 for A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Viaarxiv icon

Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?

Add code
Sep 14, 2023
Figure 1 for Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?
Viaarxiv icon

Online Adaptive Mahalanobis Distance Estimation

Add code
Sep 02, 2023
Viaarxiv icon

Solving Attention Kernel Regression Problem via Pre-conditioner

Add code
Aug 28, 2023
Figure 1 for Solving Attention Kernel Regression Problem via Pre-conditioner
Figure 2 for Solving Attention Kernel Regression Problem via Pre-conditioner
Figure 3 for Solving Attention Kernel Regression Problem via Pre-conditioner
Figure 4 for Solving Attention Kernel Regression Problem via Pre-conditioner
Viaarxiv icon

How to Protect Copyright Data in Optimization of Large Language Models?

Add code
Aug 23, 2023
Viaarxiv icon

GradientCoin: A Peer-to-Peer Decentralized Large Language Models

Add code
Aug 21, 2023
Viaarxiv icon