Alert button
Picture for Max Vladymyrov

Max Vladymyrov

Alert button

Linear Transformers are Versatile In-Context Learners

Feb 21, 2024
Max Vladymyrov, Johannes von Oswald, Mark Sandler, Rong Ge

Viaarxiv icon

Uncovering mesa-optimization algorithms in Transformers

Sep 11, 2023
Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento

Viaarxiv icon

Continual Few-Shot Learning Using HyperTransformers

Jan 12, 2023
Max Vladymyrov, Andrey Zhmoginov, Mark Sandler

Figure 1 for Continual Few-Shot Learning Using HyperTransformers
Figure 2 for Continual Few-Shot Learning Using HyperTransformers
Figure 3 for Continual Few-Shot Learning Using HyperTransformers
Figure 4 for Continual Few-Shot Learning Using HyperTransformers
Viaarxiv icon

Training trajectories, mini-batch losses and the curious role of the learning rate

Jan 05, 2023
Mark Sandler, Andrey Zhmoginov, Max Vladymyrov, Nolan Miller

Figure 1 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 2 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 3 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 4 for Training trajectories, mini-batch losses and the curious role of the learning rate
Viaarxiv icon

Transformers learn in-context by gradient descent

Dec 15, 2022
Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov

Figure 1 for Transformers learn in-context by gradient descent
Figure 2 for Transformers learn in-context by gradient descent
Figure 3 for Transformers learn in-context by gradient descent
Figure 4 for Transformers learn in-context by gradient descent
Viaarxiv icon

Decentralized Learning with Multi-Headed Distillation

Nov 28, 2022
Andrey Zhmoginov, Mark Sandler, Nolan Miller, Gus Kristiansen, Max Vladymyrov

Figure 1 for Decentralized Learning with Multi-Headed Distillation
Figure 2 for Decentralized Learning with Multi-Headed Distillation
Figure 3 for Decentralized Learning with Multi-Headed Distillation
Figure 4 for Decentralized Learning with Multi-Headed Distillation
Viaarxiv icon

Fine-tuning Image Transformers using Learnable Memory

Mar 30, 2022
Mark Sandler, Andrey Zhmoginov, Max Vladymyrov, Andrew Jackson

Figure 1 for Fine-tuning Image Transformers using Learnable Memory
Figure 2 for Fine-tuning Image Transformers using Learnable Memory
Figure 3 for Fine-tuning Image Transformers using Learnable Memory
Figure 4 for Fine-tuning Image Transformers using Learnable Memory
Viaarxiv icon

HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning

Jan 15, 2022
Andrey Zhmoginov, Mark Sandler, Max Vladymyrov

Figure 1 for HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Figure 2 for HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Figure 3 for HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Figure 4 for HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Viaarxiv icon

GradMax: Growing Neural Networks using Gradient Information

Jan 13, 2022
Utku Evci, Max Vladymyrov, Thomas Unterthiner, Bart van Merriënboer, Fabian Pedregosa

Figure 1 for GradMax: Growing Neural Networks using Gradient Information
Figure 2 for GradMax: Growing Neural Networks using Gradient Information
Figure 3 for GradMax: Growing Neural Networks using Gradient Information
Figure 4 for GradMax: Growing Neural Networks using Gradient Information
Viaarxiv icon