Picture for Yaoyu Zhang

Yaoyu Zhang

Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Add code
May 24, 2024
Figure 1 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 2 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 3 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 4 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Viaarxiv icon

A rationale from frequency perspective for grokking in training neural network

Add code
May 24, 2024
Figure 1 for A rationale from frequency perspective for grokking in training neural network
Figure 2 for A rationale from frequency perspective for grokking in training neural network
Figure 3 for A rationale from frequency perspective for grokking in training neural network
Figure 4 for A rationale from frequency perspective for grokking in training neural network
Viaarxiv icon

Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion

Add code
May 22, 2024
Figure 1 for Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion
Figure 2 for Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion
Figure 3 for Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion
Figure 4 for Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion
Viaarxiv icon

Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target

Add code
May 22, 2024
Figure 1 for Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target
Figure 2 for Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target
Figure 3 for Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target
Figure 4 for Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target
Viaarxiv icon

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Add code
May 08, 2024
Figure 1 for Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing
Figure 2 for Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing
Figure 3 for Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing
Figure 4 for Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing
Viaarxiv icon

Structure and Gradient Dynamics Near Global Minima of Two-layer Neural Networks

Add code
Sep 01, 2023
Figure 1 for Structure and Gradient Dynamics Near Global Minima of Two-layer Neural Networks
Figure 2 for Structure and Gradient Dynamics Near Global Minima of Two-layer Neural Networks
Figure 3 for Structure and Gradient Dynamics Near Global Minima of Two-layer Neural Networks
Figure 4 for Structure and Gradient Dynamics Near Global Minima of Two-layer Neural Networks
Viaarxiv icon

Optimistic Estimate Uncovers the Potential of Nonlinear Models

Add code
Jul 18, 2023
Figure 1 for Optimistic Estimate Uncovers the Potential of Nonlinear Models
Figure 2 for Optimistic Estimate Uncovers the Potential of Nonlinear Models
Figure 3 for Optimistic Estimate Uncovers the Potential of Nonlinear Models
Figure 4 for Optimistic Estimate Uncovers the Potential of Nonlinear Models
Viaarxiv icon

Linear Stability Hypothesis and Rank Stratification for Nonlinear Models

Add code
Nov 21, 2022
Figure 1 for Linear Stability Hypothesis and Rank Stratification for Nonlinear Models
Figure 2 for Linear Stability Hypothesis and Rank Stratification for Nonlinear Models
Figure 3 for Linear Stability Hypothesis and Rank Stratification for Nonlinear Models
Figure 4 for Linear Stability Hypothesis and Rank Stratification for Nonlinear Models
Viaarxiv icon

Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks

Add code
May 26, 2022
Figure 1 for Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Figure 2 for Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Figure 3 for Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Figure 4 for Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Viaarxiv icon

Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

Add code
May 24, 2022
Figure 1 for Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width
Figure 2 for Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width
Figure 3 for Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width
Figure 4 for Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width
Viaarxiv icon