Picture for Eshaan Nichani

Eshaan Nichani

Learning Compositional Functions with Transformers from Easy-to-Hard Data

Add code
May 29, 2025
Viaarxiv icon

Emergence and scaling laws in SGD learning of shallow neural networks

Add code
Apr 28, 2025
Viaarxiv icon

Understanding Factual Recall in Transformers via Associative Memories

Add code
Dec 09, 2024
Figure 1 for Understanding Factual Recall in Transformers via Associative Memories
Figure 2 for Understanding Factual Recall in Transformers via Associative Memories
Figure 3 for Understanding Factual Recall in Transformers via Associative Memories
Figure 4 for Understanding Factual Recall in Transformers via Associative Memories
Viaarxiv icon

Learning Hierarchical Polynomials of Multiple Nonlinear Features with Three-Layer Networks

Add code
Nov 26, 2024
Viaarxiv icon

How Transformers Learn Causal Structure with Gradient Descent

Add code
Feb 22, 2024
Viaarxiv icon

Learning Hierarchical Polynomials with Three-Layer Neural Networks

Add code
Nov 23, 2023
Viaarxiv icon

Fine-Tuning Language Models with Just Forward Passes

Add code
May 27, 2023
Figure 1 for Fine-Tuning Language Models with Just Forward Passes
Figure 2 for Fine-Tuning Language Models with Just Forward Passes
Figure 3 for Fine-Tuning Language Models with Just Forward Passes
Figure 4 for Fine-Tuning Language Models with Just Forward Passes
Viaarxiv icon

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

Add code
May 18, 2023
Viaarxiv icon

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Add code
May 11, 2023
Viaarxiv icon

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

Add code
Sep 30, 2022
Figure 1 for Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Figure 2 for Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Figure 3 for Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Figure 4 for Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Viaarxiv icon