Alert button
Picture for Nolan Miller

Nolan Miller

Alert button

Uncovering mesa-optimization algorithms in Transformers

Add code
Bookmark button
Alert button
Sep 11, 2023
Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento

Viaarxiv icon

Training trajectories, mini-batch losses and the curious role of the learning rate

Add code
Bookmark button
Alert button
Jan 05, 2023
Mark Sandler, Andrey Zhmoginov, Max Vladymyrov, Nolan Miller

Figure 1 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 2 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 3 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 4 for Training trajectories, mini-batch losses and the curious role of the learning rate
Viaarxiv icon

Decentralized Learning with Multi-Headed Distillation

Add code
Bookmark button
Alert button
Nov 28, 2022
Andrey Zhmoginov, Mark Sandler, Nolan Miller, Gus Kristiansen, Max Vladymyrov

Figure 1 for Decentralized Learning with Multi-Headed Distillation
Figure 2 for Decentralized Learning with Multi-Headed Distillation
Figure 3 for Decentralized Learning with Multi-Headed Distillation
Figure 4 for Decentralized Learning with Multi-Headed Distillation
Viaarxiv icon

Meta-Learning Bidirectional Update Rules

Add code
Bookmark button
Alert button
Apr 10, 2021
Mark Sandler, Max Vladymyrov, Andrey Zhmoginov, Nolan Miller, Andrew Jackson, Tom Madams, Blaise Aguera y Arcas

Figure 1 for Meta-Learning Bidirectional Update Rules
Figure 2 for Meta-Learning Bidirectional Update Rules
Figure 3 for Meta-Learning Bidirectional Update Rules
Figure 4 for Meta-Learning Bidirectional Update Rules
Viaarxiv icon