Picture for Jeffrey Pennington

Jeffrey Pennington

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Add code
Aug 15, 2020
Figure 1 for The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization
Figure 2 for The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization
Figure 3 for The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization
Figure 4 for The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization
Viaarxiv icon

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

Add code
Jun 25, 2020
Figure 1 for The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
Figure 2 for The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
Figure 3 for The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
Figure 4 for The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
Viaarxiv icon

Exact posterior distributions of wide Bayesian neural networks

Add code
Jun 18, 2020
Figure 1 for Exact posterior distributions of wide Bayesian neural networks
Figure 2 for Exact posterior distributions of wide Bayesian neural networks
Viaarxiv icon

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

Add code
Jan 16, 2020
Figure 1 for Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks
Figure 2 for Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks
Viaarxiv icon

Disentangling trainability and generalization in deep learning

Add code
Dec 30, 2019
Figure 1 for Disentangling trainability and generalization in deep learning
Figure 2 for Disentangling trainability and generalization in deep learning
Figure 3 for Disentangling trainability and generalization in deep learning
Figure 4 for Disentangling trainability and generalization in deep learning
Viaarxiv icon

A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning

Add code
Dec 02, 2019
Figure 1 for A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning
Figure 2 for A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning
Figure 3 for A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning
Figure 4 for A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning
Viaarxiv icon

A Mean Field Theory of Batch Normalization

Add code
Mar 05, 2019
Figure 1 for A Mean Field Theory of Batch Normalization
Figure 2 for A Mean Field Theory of Batch Normalization
Figure 3 for A Mean Field Theory of Batch Normalization
Figure 4 for A Mean Field Theory of Batch Normalization
Viaarxiv icon

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Add code
Feb 18, 2019
Figure 1 for Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Figure 2 for Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Figure 3 for Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Figure 4 for Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Viaarxiv icon

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Add code
Jan 25, 2019
Figure 1 for Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs
Figure 2 for Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs
Figure 3 for Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs
Figure 4 for Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs
Viaarxiv icon

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

Add code
Oct 11, 2018
Figure 1 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
Figure 2 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
Figure 3 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
Figure 4 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
Viaarxiv icon