Picture for Yaoyu Zhang

Yaoyu Zhang

Adaptive Preconditioners Trigger Loss Spikes in Adam

Add code
Jun 05, 2025
Viaarxiv icon

Scalable Complexity Control Facilitates Reasoning Ability of LLMs

Add code
May 29, 2025
Viaarxiv icon

Uncovering Critical Sets of Deep Neural Networks via Sample-Independent Critical Lifting

Add code
May 19, 2025
Viaarxiv icon

Embedding principle of homogeneous neural network for classification problem

Add code
May 18, 2025
Viaarxiv icon

An overview of condensation phenomenon in deep learning

Add code
Apr 13, 2025
Viaarxiv icon

Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers

Add code
Jan 15, 2025
Figure 1 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Figure 2 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Figure 3 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Figure 4 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Viaarxiv icon

Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization

Add code
Jun 26, 2024
Viaarxiv icon

Geometry of Critical Sets and Existence of Saddle Branches for Two-layer Neural Networks

Add code
May 26, 2024
Viaarxiv icon

A rationale from frequency perspective for grokking in training neural network

Add code
May 24, 2024
Figure 1 for A rationale from frequency perspective for grokking in training neural network
Figure 2 for A rationale from frequency perspective for grokking in training neural network
Figure 3 for A rationale from frequency perspective for grokking in training neural network
Figure 4 for A rationale from frequency perspective for grokking in training neural network
Viaarxiv icon

Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Add code
May 24, 2024
Figure 1 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 2 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 3 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 4 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Viaarxiv icon