Picture for Depen Morwani

Depen Morwani

Deconstructing What Makes a Good Optimizer for Language Models

Add code
Jul 10, 2024
Viaarxiv icon

A New Perspective on Shampoo's Preconditioner

Add code
Jun 25, 2024
Viaarxiv icon

Feature emergence via margin maximization: case studies in algebraic tasks

Add code
Nov 13, 2023
Figure 1 for Feature emergence via margin maximization: case studies in algebraic tasks
Figure 2 for Feature emergence via margin maximization: case studies in algebraic tasks
Figure 3 for Feature emergence via margin maximization: case studies in algebraic tasks
Figure 4 for Feature emergence via margin maximization: case studies in algebraic tasks
Viaarxiv icon

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Add code
Jun 14, 2023
Figure 1 for Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Figure 2 for Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Figure 3 for Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Figure 4 for Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Viaarxiv icon

Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

Add code
May 28, 2023
Figure 1 for Feature-Learning Networks Are Consistent Across Widths At Realistic Scales
Figure 2 for Feature-Learning Networks Are Consistent Across Widths At Realistic Scales
Figure 3 for Feature-Learning Networks Are Consistent Across Widths At Realistic Scales
Figure 4 for Feature-Learning Networks Are Consistent Across Widths At Realistic Scales
Viaarxiv icon

Simplicity Bias in 1-Hidden Layer Neural Networks

Add code
Feb 01, 2023
Figure 1 for Simplicity Bias in 1-Hidden Layer Neural Networks
Figure 2 for Simplicity Bias in 1-Hidden Layer Neural Networks
Figure 3 for Simplicity Bias in 1-Hidden Layer Neural Networks
Figure 4 for Simplicity Bias in 1-Hidden Layer Neural Networks
Viaarxiv icon

Using noise resilience for ranking generalization of deep neural networks

Add code
Dec 16, 2020
Figure 1 for Using noise resilience for ranking generalization of deep neural networks
Figure 2 for Using noise resilience for ranking generalization of deep neural networks
Viaarxiv icon

Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets

Add code
Oct 24, 2020
Figure 1 for Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets
Figure 2 for Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets
Figure 3 for Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets
Figure 4 for Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets
Viaarxiv icon