Alert button
Picture for Jascha Sohl-Dickstein

Jascha Sohl-Dickstein

Alert button

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

May 09, 2019
Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith

Figure 1 for The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Figure 2 for The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Figure 3 for The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Figure 4 for The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Viaarxiv icon

A RAD approach to deep mixture models

Mar 18, 2019
Laurent Dinh, Jascha Sohl-Dickstein, Razvan Pascanu, Hugo Larochelle

Figure 1 for A RAD approach to deep mixture models
Figure 2 for A RAD approach to deep mixture models
Figure 3 for A RAD approach to deep mixture models
Figure 4 for A RAD approach to deep mixture models
Viaarxiv icon

A Mean Field Theory of Batch Normalization

Mar 05, 2019
Greg Yang, Jeffrey Pennington, Vinay Rao, Jascha Sohl-Dickstein, Samuel S. Schoenholz

Figure 1 for A Mean Field Theory of Batch Normalization
Figure 2 for A Mean Field Theory of Batch Normalization
Figure 3 for A Mean Field Theory of Batch Normalization
Figure 4 for A Mean Field Theory of Batch Normalization
Viaarxiv icon

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Feb 18, 2019
Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Jascha Sohl-Dickstein, Jeffrey Pennington

Figure 1 for Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Figure 2 for Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Figure 3 for Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Figure 4 for Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Viaarxiv icon

Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit

Jan 12, 2019
Jascha Sohl-Dickstein, Kenji Kawaguchi

Figure 1 for Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit
Viaarxiv icon

Measuring the Effects of Data Parallelism on Neural Network Training

Nov 21, 2018
Christopher J. Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, George E. Dahl

Figure 1 for Measuring the Effects of Data Parallelism on Neural Network Training
Figure 2 for Measuring the Effects of Data Parallelism on Neural Network Training
Figure 3 for Measuring the Effects of Data Parallelism on Neural Network Training
Figure 4 for Measuring the Effects of Data Parallelism on Neural Network Training
Viaarxiv icon

Learned optimizers that outperform SGD on wall-clock and test loss

Oct 26, 2018
Luke Metz, Niru Maheswaranathan, Jeremy Nixon, C. Daniel Freeman, Jascha Sohl-Dickstein

Figure 1 for Learned optimizers that outperform SGD on wall-clock and test loss
Figure 2 for Learned optimizers that outperform SGD on wall-clock and test loss
Figure 3 for Learned optimizers that outperform SGD on wall-clock and test loss
Figure 4 for Learned optimizers that outperform SGD on wall-clock and test loss
Viaarxiv icon

Stochastic natural gradient descent draws posterior samples in function space

Oct 16, 2018
Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, Jascha Sohl-Dickstein

Figure 1 for Stochastic natural gradient descent draws posterior samples in function space
Figure 2 for Stochastic natural gradient descent draws posterior samples in function space
Figure 3 for Stochastic natural gradient descent draws posterior samples in function space
Figure 4 for Stochastic natural gradient descent draws posterior samples in function space
Viaarxiv icon

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

Oct 11, 2018
Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

Figure 1 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
Figure 2 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
Figure 3 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
Figure 4 for Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes
Viaarxiv icon