Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aleksandr I. Panov

Gradual Optimization Learning for Conformational Energy Minimization

Nov 05, 2023

Artem Tsypin, Leonid Ugadiarov, Kuzma Khrabrov, Manvel Avetisian, Alexander Telepov, Egor Rumiantsev, Alexey Skrynnik, Aleksandr I. Panov, Dmitry Vetrov, Elena Tutubalina(+1 more)

Figure 1 for Gradual Optimization Learning for Conformational Energy Minimization

Figure 2 for Gradual Optimization Learning for Conformational Energy Minimization

Figure 3 for Gradual Optimization Learning for Conformational Energy Minimization

Figure 4 for Gradual Optimization Learning for Conformational Energy Minimization

Abstract:Molecular conformation optimization is crucial to computer-aided drug discovery and materials design. Traditional energy minimization techniques rely on iterative optimization methods that use molecular forces calculated by a physical simulator (oracle) as anti-gradients. However, this is a computationally expensive approach that requires many interactions with a physical simulator. One way to accelerate this procedure is to replace the physical simulator with a neural network. Despite recent progress in neural networks for molecular conformation energy prediction, such models are prone to distribution shift, leading to inaccurate energy minimization. We find that the quality of energy minimization with neural networks can be improved by providing optimization trajectories as additional training data. Still, it takes around $5 \times 10^5$ additional conformations to match the physical simulator's optimization quality. In this work, we present the Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks that significantly reduces the required additional data. The framework consists of an efficient data-collecting scheme and an external optimizer. The external optimizer utilizes gradients from the energy prediction model to generate optimization trajectories, and the data-collecting scheme selects additional training data to be processed by the physical simulator. Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules using $50$x less additional data.

* 17 pages, 5 figures

Via

Access Paper or Ask Questions

Graphical Object-Centric Actor-Critic

Oct 26, 2023

Leonid Ugadiarov, Aleksandr I. Panov

Figure 1 for Graphical Object-Centric Actor-Critic

Figure 2 for Graphical Object-Centric Actor-Critic

Figure 3 for Graphical Object-Centric Actor-Critic

Figure 4 for Graphical Object-Centric Actor-Critic

Abstract:There have recently been significant advances in the problem of unsupervised object-centric representation learning and its application to downstream tasks. The latest works support the argument that employing disentangled object representations in image-based object-centric reinforcement learning tasks facilitates policy learning. We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches to utilize these representations effectively. In our approach, we use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment. The proposed method fills a research gap in developing efficient object-centric world models for reinforcement learning settings that can be used for environments with discrete or continuous action spaces. Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm built upon transformer architecture and the state-of-the-art monolithic model-based algorithm.

Via

Access Paper or Ask Questions

Learning Successor Representations with Distributed Hebbian Temporal Memory

Oct 20, 2023

Evgenii Dzhivelikian, Petr Kuderov, Aleksandr I. Panov

Figure 1 for Learning Successor Representations with Distributed Hebbian Temporal Memory

Figure 2 for Learning Successor Representations with Distributed Hebbian Temporal Memory

Figure 3 for Learning Successor Representations with Distributed Hebbian Temporal Memory

Figure 4 for Learning Successor Representations with Distributed Hebbian Temporal Memory

Abstract:This paper presents a novel approach to address the challenge of online hidden representation learning for decision-making under uncertainty in non-stationary, partially observable environments. The proposed algorithm, Distributed Hebbian Temporal Memory (DHTM), is based on factor graph formalism and a multicomponent neuron model. DHTM aims to capture sequential data relationships and make cumulative predictions about future observations, forming Successor Representation (SR). Inspired by neurophysiological models of the neocortex, the algorithm utilizes distributed representations, sparse transition matrices, and local Hebbian-like learning rules to overcome the instability and slow learning process of traditional temporal memory algorithms like RNN and HMM. Experimental results demonstrate that DHTM outperforms classical LSTM and performs comparably to more advanced RNN-like algorithms, speeding up Temporal Difference learning for SR in changing environments. Additionally, we compare the SRs produced by DHTM to another biologically inspired HMM-like algorithm, CSCG. Our findings suggest that DHTM is a promising approach for addressing the challenges of online hidden representation learning in dynamic environments.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

Recurrent Memory Decision Transformer

Jul 05, 2023

Arkadii Bessonov, Alexey Staroverov, Huzhenyu Zhang, Alexey K. Kovalev, Dmitry Yudin, Aleksandr I. Panov

Figure 1 for Recurrent Memory Decision Transformer

Figure 2 for Recurrent Memory Decision Transformer

Figure 3 for Recurrent Memory Decision Transformer

Figure 4 for Recurrent Memory Decision Transformer

Abstract:Originally developed for natural language problems, transformer models have recently been widely used in offline reinforcement learning tasks. This is because the agent's history can be represented as a sequence, and the whole task can be reduced to the sequence modeling task. However, the quadratic complexity of the transformer operation limits the potential increase in context. Therefore, different versions of the memory mechanism are used to work with long sequences in a natural language. This paper proposes the Recurrent Memory Decision Transformer (RMDT), a model that uses a recurrent memory mechanism for reinforcement learning problems. We conduct thorough experiments on Atari games and MuJoCo control problems and show that our proposed model is significantly superior to its counterparts without the recurrent memory mechanism on Atari games. We also carefully study the effect of memory on the performance of the proposed model. These findings shed light on the potential of incorporating recurrent memory mechanisms to improve the performance of large-scale transformer models in offline reinforcement learning tasks. The Recurrent Memory Decision Transformer code is publicly available in the repository \url{https://anonymous.4open.science/r/RMDT-4FE4}.

Via

Access Paper or Ask Questions

Intrinsic Motivation in Model-based Reinforcement Learning: A Brief Review

Jan 24, 2023

Artem Latyshev, Aleksandr I. Panov

Abstract:The reinforcement learning research area contains a wide range of methods for solving the problems of intelligent agent control. Despite the progress that has been made, the task of creating a highly autonomous agent is still a significant challenge. One potential solution to this problem is intrinsic motivation, a concept derived from developmental psychology. This review considers the existing methods for determining intrinsic motivation based on the world model obtained by the agent. We propose a systematic approach to current research in this field, which consists of three categories of methods, distinguished by the way they utilize a world model in the agent's components: complementary intrinsic reward, exploration policy, and intrinsically motivated goals. The proposed unified framework describes the architecture of agents using a world model and intrinsic motivation to improve learning. The potential for developing new techniques in this area of research is also examined.

* 13 pages, 7 figures

Via

Access Paper or Ask Questions

HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D Images

Dec 30, 2022

Dmitry Yudin, Yaroslav Solomentsev, Ruslan Musaev, Aleksei Staroverov, Aleksandr I. Panov

Figure 1 for HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D Images

Figure 2 for HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D Images

Figure 3 for HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D Images

Figure 4 for HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D Images

Abstract:We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment and loop detection in simultaneous localization and mapping. The loop detection sub-task is especially relevant when a robot with an on-board RGB-D camera can drive past the same place (``Point") at different angles. The dataset is based on the popular Habitat simulator, in which it is possible to generate photorealistic indoor scenes using both own sensor data and open datasets, such as Matterport3D. To study the main stages of solving the place recognition problem on the HPointLoc dataset, we proposed a new modular approach named as PNTR. It first performs an image retrieval with the Patch-NetVLAD method, then extracts keypoints and matches them using R2D2, LoFTR or SuperPoint with SuperGlue, and finally performs a camera pose optimization step with TEASER++. Such a solution to the place recognition problem has not been previously studied in existing publications. The PNTR approach has shown the best quality metrics on the HPointLoc dataset and has a high potential for real use in localization systems for unmanned vehicles. The proposed dataset and framework are publicly available: https://github.com/metra4ok/HPointLoc.

* Accepted for publishing in proceedings of the 29th International Conference on Neural Information Processing (ICONIP 2022)

Via

Access Paper or Ask Questions

POGEMA: Partially Observable Grid Environment for Multiple Agents

Jun 22, 2022

Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, Aleksandr I. Panov

Figure 1 for POGEMA: Partially Observable Grid Environment for Multiple Agents

Figure 2 for POGEMA: Partially Observable Grid Environment for Multiple Agents

Figure 3 for POGEMA: Partially Observable Grid Environment for Multiple Agents

Figure 4 for POGEMA: Partially Observable Grid Environment for Multiple Agents

Abstract:We introduce POGEMA (https://github.com/AIRI-Institute/pogema) a sandbox for challenging partially observable multi-agent pathfinding (PO-MAPF) problems . This is a grid-based environment that was specifically designed to be a flexible, tunable and scalable benchmark. It can be tailored to a variety of PO-MAPF, which can serve as an excellent testing ground for planning and learning methods, and their combination, which will allow us to move towards filling the gap between AI planning and learning.

* 7 pages, 7 figures

Via

Access Paper or Ask Questions

IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

May 31, 2022

Artem Zholus, Alexey Skrynnik, Shrestha Mohanty, Zoya Volovikova, Julia Kiseleva, Artur Szlam, Marc-Alexandre Coté, Aleksandr I. Panov

Figure 1 for IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

Figure 2 for IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

Figure 3 for IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

Abstract:We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way. The environment features visual agent embodiment, interactive learning through collaboration, language conditioned RL, and combinatorically hard task (3d blocks building) space.

Via

Access Paper or Ask Questions

Multitask Adaptation by Retrospective Exploration with Learned World Models

Oct 25, 2021

Artem Zholus, Aleksandr I. Panov

Figure 1 for Multitask Adaptation by Retrospective Exploration with Learned World Models

Figure 2 for Multitask Adaptation by Retrospective Exploration with Learned World Models

Figure 3 for Multitask Adaptation by Retrospective Exploration with Learned World Models

Figure 4 for Multitask Adaptation by Retrospective Exploration with Learned World Models

Abstract:Model-based reinforcement learning (MBRL) allows solving complex tasks in a sample-efficient manner. However, no information is reused between the tasks. In this work, we propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from continuously growing task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage. We show that such retrospective exploration can accelerate the learning process of the MBRL agent by better informing learned dynamics and prompting agent with exploratory trajectories. We test the performance of our approach on several domains from the DeepMind control suite, from Metaworld multitask benchmark, and from our bespoke environment implemented with a robotic NVIDIA Isaac simulator to test the ability of the model to act in a photorealistic, ray-traced environment.

Via

Access Paper or Ask Questions

Long-Term Exploration in Persistent MDPs

Sep 21, 2021

Leonid Ugadiarov, Alexey Skrynnik, Aleksandr I. Panov

Figure 1 for Long-Term Exploration in Persistent MDPs

Figure 2 for Long-Term Exploration in Persistent MDPs

Figure 3 for Long-Term Exploration in Persistent MDPs

Figure 4 for Long-Term Exploration in Persistent MDPs

Abstract:Exploration is an essential part of reinforcement learning, which restricts the quality of learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, an exhaustive exploration of the environment is often impossible, and the successful training of an agent requires a lot of interaction steps. In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process, in which agents during training can roll back to visited states. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge. At all used levels of the game, our agent outperforms or shows comparable results with state-of-the-art curiosity methods with knowledge-based intrinsic motivation: ICM and RND. An implementation of RbExplore can be found at https://github.com/cds-mipt/RbExplore.

* This is a preprint of the paper accepted to MICAI 2021. It contains 13 pages and 6 figures

Via

Access Paper or Ask Questions