Alert button
Picture for Jeffrey Pennington

Jeffrey Pennington

Alert button

Training LLMs over Neurally Compressed Text

Add code
Bookmark button
Alert button
Apr 04, 2024
Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant

Viaarxiv icon

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Add code
Bookmark button
Alert button
Dec 22, 2023
Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel

Figure 1 for Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Figure 2 for Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Figure 3 for Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Figure 4 for Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Viaarxiv icon

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

Add code
Bookmark button
Alert button
Nov 15, 2023
C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant, Peter J. Liu, Roman Novak, Yundi Qian, Noah Fiedel, Jascha Sohl-Dickstein

Viaarxiv icon

Small-scale proxies for large-scale Transformer training instabilities

Add code
Bookmark button
Alert button
Sep 25, 2023
Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith

Figure 1 for Small-scale proxies for large-scale Transformer training instabilities
Figure 2 for Small-scale proxies for large-scale Transformer training instabilities
Figure 3 for Small-scale proxies for large-scale Transformer training instabilities
Figure 4 for Small-scale proxies for large-scale Transformer training instabilities
Viaarxiv icon

Second-order regression models exhibit progressive sharpening to the edge of stability

Add code
Bookmark button
Alert button
Oct 10, 2022
Atish Agarwala, Fabian Pedregosa, Jeffrey Pennington

Figure 1 for Second-order regression models exhibit progressive sharpening to the edge of stability
Figure 2 for Second-order regression models exhibit progressive sharpening to the edge of stability
Figure 3 for Second-order regression models exhibit progressive sharpening to the edge of stability
Figure 4 for Second-order regression models exhibit progressive sharpening to the edge of stability
Viaarxiv icon

Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm

Add code
Bookmark button
Alert button
Jul 11, 2022
Lechao Xiao, Jeffrey Pennington

Figure 1 for Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
Figure 2 for Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
Figure 3 for Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
Figure 4 for Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
Viaarxiv icon

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

Add code
Bookmark button
Alert button
Jun 15, 2022
Jiri Hron, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein

Figure 1 for Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling
Figure 2 for Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling
Figure 3 for Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling
Figure 4 for Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling
Viaarxiv icon

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

Add code
Bookmark button
Alert button
Jun 15, 2022
Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

Figure 1 for Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Figure 2 for Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Figure 3 for Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Figure 4 for Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Viaarxiv icon

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

Add code
Bookmark button
Alert button
May 30, 2022
Lechao Xiao, Jeffrey Pennington

Figure 1 for Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression
Figure 2 for Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression
Figure 3 for Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression
Figure 4 for Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression
Viaarxiv icon

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties

Add code
Bookmark button
Alert button
May 14, 2022
Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

Figure 1 for Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties
Figure 2 for Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties
Figure 3 for Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties
Figure 4 for Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties
Viaarxiv icon