Picture for Denny Wu

Denny Wu

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

Add code
Mar 27, 2026
Viaarxiv icon

Learning to Recall with Transformers Beyond Orthogonal Embeddings

Add code
Mar 16, 2026
Viaarxiv icon

Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning

Add code
Feb 02, 2026
Viaarxiv icon

Understanding the Mechanisms of Fast Hyperparameter Transfer

Add code
Dec 28, 2025
Viaarxiv icon

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers

Add code
Dec 21, 2025
Figure 1 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 2 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 3 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 4 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Viaarxiv icon

Learning Compositional Functions with Transformers from Easy-to-Hard Data

Add code
May 29, 2025
Viaarxiv icon

Emergence and scaling laws in SGD learning of shallow neural networks

Add code
Apr 28, 2025
Viaarxiv icon

Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time

Add code
Apr 17, 2025
Figure 1 for Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time
Figure 2 for Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time
Figure 3 for Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time
Figure 4 for Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time
Viaarxiv icon

When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective

Add code
Mar 14, 2025
Viaarxiv icon

Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

Add code
Feb 02, 2025
Viaarxiv icon