Alert button
Picture for Ivan Anokhin

Ivan Anokhin

Alert button

Thinker: Learning to Plan and Act

Jul 27, 2023
Stephen Chung, Ivan Anokhin, David Krueger

Figure 1 for Thinker: Learning to Plan and Act
Figure 2 for Thinker: Learning to Plan and Act
Figure 3 for Thinker: Learning to Plan and Act
Figure 4 for Thinker: Learning to Plan and Act

We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a learned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for hand-crafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization. We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. The algorithm's generality opens a new research direction on how a world model can be used in reinforcement learning and how planning can be seamlessly integrated into an agent's decision-making process.

* 37 pages 
Viaarxiv icon

Embedded Ensembles: Infinite Width Limit and Operating Regimes

Feb 24, 2022
Maksim Velikanov, Roman Kail, Ivan Anokhin, Roman Vashurin, Maxim Panov, Alexey Zaytsev, Dmitry Yarotsky

Figure 1 for Embedded Ensembles: Infinite Width Limit and Operating Regimes
Figure 2 for Embedded Ensembles: Infinite Width Limit and Operating Regimes
Figure 3 for Embedded Ensembles: Infinite Width Limit and Operating Regimes
Figure 4 for Embedded Ensembles: Infinite Width Limit and Operating Regimes

A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number of models. Theoretically, we use a Neural-Tangent-Kernel-based approach to derive the wide network limit of the gradient descent dynamics. In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models. We prove that in the independent regime the embedded ensemble behaves as an ensemble of independent models. We confirm our theoretical prediction with a wide range of experiments with finite networks, and further study empirically various effects such as transition between the two regimes, scaling of ensemble performance with the network width and number of models, and dependence of performance on a number of architecture and hyperparameter choices.

Viaarxiv icon

Image Generators with Conditionally-Independent Pixel Synthesis

Nov 27, 2020
Ivan Anokhin, Kirill Demochkin, Taras Khakhulin, Gleb Sterkin, Victor Lempitsky, Denis Korzhenkov

Figure 1 for Image Generators with Conditionally-Independent Pixel Synthesis
Figure 2 for Image Generators with Conditionally-Independent Pixel Synthesis
Figure 3 for Image Generators with Conditionally-Independent Pixel Synthesis
Figure 4 for Image Generators with Conditionally-Independent Pixel Synthesis

Existing image generator networks rely heavily on spatial convolutions and, optionally, self-attention blocks in order to gradually synthesize images in a coarse-to-fine manner. Here, we present a new architecture for image generators, where the color value at each pixel is computed independently given the value of a random latent vector and the coordinate of that pixel. No spatial convolutions or similar operations that propagate information across pixels are involved during the synthesis. We analyze the modeling capabilities of such generators when trained in an adversarial fashion, and observe the new generators to achieve similar generation quality to state-of-the-art convolutional generators. We also investigate several interesting properties unique to the new architecture.

Viaarxiv icon

Low-loss connection of weight vectors: distribution-based approaches

Aug 03, 2020
Ivan Anokhin, Dmitry Yarotsky

Figure 1 for Low-loss connection of weight vectors: distribution-based approaches
Figure 2 for Low-loss connection of weight vectors: distribution-based approaches
Figure 3 for Low-loss connection of weight vectors: distribution-based approaches
Figure 4 for Low-loss connection of weight vectors: distribution-based approaches

Recent research shows that sublevel sets of the loss surfaces of overparameterized networks are connected, exactly or approximately. We describe and compare experimentally a panel of methods used to connect two low-loss points by a low-loss curve on this surface. Our methods vary in accuracy and complexity. Most of our methods are based on "macroscopic" distributional assumptions, and some are insensitive to the detailed properties of the points to be connected. Some methods require a prior training of a "global connection model" which can then be applied to any pair of points. The accuracy of the method generally correlates with its complexity and sensitivity to the endpoint detail.

* accepted to ICML 2020 
Viaarxiv icon

High-Resolution Daytime Translation Without Domain Labels

Mar 23, 2020
Ivan Anokhin, Pavel Solovev, Denis Korzhenkov, Alexey Kharlamov, Taras Khakhulin, Alexey Silvestrov, Sergey Nikolenko, Victor Lempitsky, Gleb Sterkin

Figure 1 for High-Resolution Daytime Translation Without Domain Labels
Figure 2 for High-Resolution Daytime Translation Without Domain Labels
Figure 3 for High-Resolution Daytime Translation Without Domain Labels
Figure 4 for High-Resolution Daytime Translation Without Domain Labels

Modeling daytime changes in high resolution photographs, e.g., re-rendering the same scene under different illuminations typical for day, night, or dawn, is a challenging image manipulation task. We present the high-resolution daytime translation (HiDT) model for this task. HiDT combines a generative image-to-image model and a new upsampling scheme that allows to apply image translation at high resolution. The model demonstrates competitive results in terms of both commonly used GAN metrics and human evaluation. Importantly, this good performance comes as a result of training on a dataset of still landscape images with no daytime labels available. Our results are available at https://saic-mdal.github.io/HiDT/.

* accepted to CVPR 2020 
Viaarxiv icon