Picture for Andrey Zhmoginov

Andrey Zhmoginov

Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones

Add code
Jun 06, 2025
Viaarxiv icon

Contextually Guided Transformers via Low-Rank Adaptation

Add code
Jun 06, 2025
Viaarxiv icon

How new data permeates LLM knowledge and how to dilute it

Add code
Apr 13, 2025
Figure 1 for How new data permeates LLM knowledge and how to dilute it
Figure 2 for How new data permeates LLM knowledge and how to dilute it
Figure 3 for How new data permeates LLM knowledge and how to dilute it
Figure 4 for How new data permeates LLM knowledge and how to dilute it
Viaarxiv icon

Long Context In-Context Compression by Getting to the Gist of Gisting

Add code
Apr 11, 2025
Figure 1 for Long Context In-Context Compression by Getting to the Gist of Gisting
Figure 2 for Long Context In-Context Compression by Getting to the Gist of Gisting
Figure 3 for Long Context In-Context Compression by Getting to the Gist of Gisting
Figure 4 for Long Context In-Context Compression by Getting to the Gist of Gisting
Viaarxiv icon

Learning and Unlearning of Fabricated Knowledge in Language Models

Add code
Oct 29, 2024
Figure 1 for Learning and Unlearning of Fabricated Knowledge in Language Models
Figure 2 for Learning and Unlearning of Fabricated Knowledge in Language Models
Figure 3 for Learning and Unlearning of Fabricated Knowledge in Language Models
Figure 4 for Learning and Unlearning of Fabricated Knowledge in Language Models
Viaarxiv icon

MELODI: Exploring Memory Compression for Long Contexts

Add code
Oct 04, 2024
Viaarxiv icon

Narrowing the Focus: Learned Optimizers for Pretrained Models

Add code
Aug 21, 2024
Figure 1 for Narrowing the Focus: Learned Optimizers for Pretrained Models
Figure 2 for Narrowing the Focus: Learned Optimizers for Pretrained Models
Figure 3 for Narrowing the Focus: Learned Optimizers for Pretrained Models
Figure 4 for Narrowing the Focus: Learned Optimizers for Pretrained Models
Viaarxiv icon

Continual Few-Shot Learning Using HyperTransformers

Add code
Jan 12, 2023
Viaarxiv icon

Training trajectories, mini-batch losses and the curious role of the learning rate

Add code
Jan 05, 2023
Figure 1 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 2 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 3 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 4 for Training trajectories, mini-batch losses and the curious role of the learning rate
Viaarxiv icon

Transformers learn in-context by gradient descent

Add code
Dec 15, 2022
Viaarxiv icon