Picture for Zhanpeng Zhou

Zhanpeng Zhou

On Path to Multimodal Historical Reasoning: HistBench and HistAgent

Add code
May 26, 2025
Viaarxiv icon

On the Role of Label Noise in the Feature Learning Process

Add code
May 25, 2025
Viaarxiv icon

New Evidence of the Two-Phase Learning Dynamics of Neural Networks

Add code
May 20, 2025
Viaarxiv icon

On the Cone Effect in the Learning Dynamics

Add code
Mar 20, 2025
Viaarxiv icon

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

Add code
Feb 26, 2025
Viaarxiv icon

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training

Add code
Oct 14, 2024
Figure 1 for Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Figure 2 for Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Figure 3 for Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Figure 4 for Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Viaarxiv icon

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

Add code
Oct 07, 2024
Figure 1 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 2 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 3 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 4 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Viaarxiv icon

Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm

Add code
Feb 06, 2024
Figure 1 for Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm
Figure 2 for Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm
Figure 3 for Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm
Figure 4 for Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm
Viaarxiv icon

Going Beyond Neural Network Feature Similarity: The Network Feature Complexity and Its Interpretation Using Category Theory

Add code
Oct 10, 2023
Viaarxiv icon

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

Add code
Jul 17, 2023
Viaarxiv icon