Alert button
Picture for Stas Tiomkin

Stas Tiomkin

Alert button

Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems

Nov 21, 2023
Derek Lilienthal, Paul Mello, Magdalini Eirinaki, Stas Tiomkin

While recommender systems have become an integral component of the Web experience, their heavy reliance on user data raises privacy and security concerns. Substituting user data with synthetic data can address these concerns, but accurately replicating these real-world datasets has been a notoriously challenging problem. Recent advancements in generative AI have demonstrated the impressive capabilities of diffusion models in generating realistic data across various domains. In this work we introduce a Score-based Diffusion Recommendation Module (SDRM), which captures the intricate patterns of real-world datasets required for training highly accurate recommender systems. SDRM allows for the generation of synthetic data that can replace existing datasets to preserve user privacy, or augment existing datasets to address excessive data sparsity. Our method outperforms competing baselines such as generative adversarial networks, variational autoencoders, and recently proposed diffusion models in synthesizing various datasets to replace or augment the original data by an average improvement of 4.30% in Recall@$k$ and 4.65% in NDCG@$k$.

* 10 pages, 3 figures 
Viaarxiv icon

Dimensionality Reduction of Dynamics on Lie Manifolds via Structure-Aware Canonical Correlation Analysis

Nov 17, 2023
Wooyoung Chung, Daniel Polani, Stas Tiomkin

Incorporating prior knowledge into a data-driven modeling problem can drastically improve performance, reliability, and generalization outside of the training sample. The stronger the structural properties, the more effective these improvements become. Manifolds are a powerful nonlinear generalization of Euclidean space for modeling finite dimensions. Structural impositions in constrained systems increase when applying group structure, converting them into Lie manifolds. The range of their applications is very wide and includes the important case of robotic tasks. Canonical Correlation Analysis (CCA) can construct a hierarchical sequence of maximal correlations of up to two paired data sets in these Euclidean spaces. We present a method to generalize this concept to Lie Manifolds and demonstrate its efficacy through the substantial improvements it achieves in making structure-consistent predictions about changes in the state of a robotic hand.

Viaarxiv icon

Controllability-Constrained Deep Network Models for Enhanced Control of Dynamical Systems

Nov 11, 2023
Suruchi Sharma, Volodymyr Makarenko, Gautam Kumar, Stas Tiomkin

Control of a dynamical system without the knowledge of dynamics is an important and challenging task. Modern machine learning approaches, such as deep neural networks (DNNs), allow for the estimation of a dynamics model from control inputs and corresponding state observation outputs. Such data-driven models are often utilized for the derivation of model-based controllers. However, in general, there are no guarantees that a model represented by DNNs will be controllable according to the formal control-theoretical meaning of controllability, which is crucial for the design of effective controllers. This often precludes the use of DNN-estimated models in applications, where formal controllability guarantees are required. In this proof-of-the-concept work, we propose a control-theoretical method that explicitly enhances models estimated from data with controllability. That is achieved by augmenting the model estimation objective with a controllability constraint, which penalizes models with a low degree of controllability. As a result, the models estimated with the proposed controllability constraint allow for the derivation of more efficient controllers, they are interpretable by the control-theoretical quantities and have a lower long-term prediction error. The proposed method provides new insights on the connection between the DNN-based estimation of unknown dynamics and the control-theoretical guarantees of the solution properties. We demonstrate the superiority of the proposed method in two standard classical control systems with state observation given by low resolution high-dimensional images.

Viaarxiv icon

Bounding the Optimal Value Function in Compositional Reinforcement Learning

Mar 05, 2023
Jacob Adamczyk, Volodymyr Makarenko, Argenis Arriojas, Stas Tiomkin, Rahul V. Kulkarni

Figure 1 for Bounding the Optimal Value Function in Compositional Reinforcement Learning
Figure 2 for Bounding the Optimal Value Function in Compositional Reinforcement Learning
Figure 3 for Bounding the Optimal Value Function in Compositional Reinforcement Learning
Figure 4 for Bounding the Optimal Value Function in Compositional Reinforcement Learning

In the field of reinforcement learning (RL), agents are often tasked with solving a variety of problems differing only in their reward functions. In order to quickly obtain solutions to unseen problems with new reward functions, a popular approach involves functional composition of previously solved tasks. However, previous work using such functional composition has primarily focused on specific instances of composition functions whose limiting assumptions allow for exact zero-shot composition. Our work unifies these examples and provides a more general framework for compositionality in both standard and entropy-regularized RL. We find that, for a broad class of functions, the optimal solution for the composite task of interest can be related to the known primitive task solutions. Specifically, we present double-sided inequalities relating the optimal composite value function to the value functions for the primitive tasks. We also show that the regret of using a zero-shot policy can be bounded for this class of functions. The derived bounds can be used to develop clipping approaches for reducing uncertainty during training, allowing agents to quickly adapt to new tasks.

Viaarxiv icon

Compositionality and Bounds for Optimal Value Functions in Reinforcement Learning

Feb 19, 2023
Jacob Adamczyk, Stas Tiomkin, Rahul Kulkarni

Figure 1 for Compositionality and Bounds for Optimal Value Functions in Reinforcement Learning
Figure 2 for Compositionality and Bounds for Optimal Value Functions in Reinforcement Learning
Figure 3 for Compositionality and Bounds for Optimal Value Functions in Reinforcement Learning
Figure 4 for Compositionality and Bounds for Optimal Value Functions in Reinforcement Learning

An agent's ability to reuse solutions to previously solved problems is critical for learning new tasks efficiently. Recent research using composition of value functions in reinforcement learning has shown that agents can utilize solutions of primitive tasks to obtain solutions for exponentially many new tasks. However, previous work has relied on restrictive assumptions on the dynamics, the method of composition, and the structure of reward functions. Here we consider the case of general composition functions without any restrictions on the structure of reward functions, applicable to both deterministic and stochastic dynamics. For this general setup, we provide bounds on the corresponding optimal value functions and characterize the value of corresponding policies. The theoretical results derived lead to improvements in training for both entropy-regularized and standard reinforcement learning, which we validate with numerical simulations.

Viaarxiv icon

Intrinsic Motivation in Dynamical Control Systems

Dec 29, 2022
Stas Tiomkin, Ilya Nemenman, Daniel Polani, Naftali Tishby

Figure 1 for Intrinsic Motivation in Dynamical Control Systems
Figure 2 for Intrinsic Motivation in Dynamical Control Systems
Figure 3 for Intrinsic Motivation in Dynamical Control Systems
Figure 4 for Intrinsic Motivation in Dynamical Control Systems

Biological systems often choose actions without an explicit reward signal, a phenomenon known as intrinsic motivation. The computational principles underlying this behavior remain poorly understood. In this study, we investigate an information-theoretic approach to intrinsic motivation, based on maximizing an agent's empowerment (the mutual information between its past actions and future states). We show that this approach generalizes previous attempts to formalize intrinsic motivation, and we provide a computationally efficient algorithm for computing the necessary quantities. We test our approach on several benchmark control problems, and we explain its success in guiding intrinsically motivated behaviors by relating our information-theoretic control function to fundamental properties of the dynamical system representing the combined agent-environment system. This opens the door for designing practical artificial, intrinsically motivated controllers and for linking animal behaviors to their dynamical properties.

Viaarxiv icon

Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning

Dec 02, 2022
Jacob Adamczyk, Argenis Arriojas, Stas Tiomkin, Rahul V. Kulkarni

Figure 1 for Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning
Figure 2 for Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning
Figure 3 for Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning
Figure 4 for Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning

In reinforcement learning (RL), the ability to utilize prior knowledge from previously solved tasks can allow agents to quickly solve new problems. In some cases, these new problems may be approximately solved by composing the solutions of previously solved primitive tasks (task composition). Otherwise, prior knowledge can be used to adjust the reward function for a new problem, in a way that leaves the optimal policy unchanged but enables quicker learning (reward shaping). In this work, we develop a general framework for reward shaping and task composition in entropy-regularized RL. To do so, we derive an exact relation connecting the optimal soft value functions for two entropy-regularized RL problems with different reward functions and dynamics. We show how the derived relation leads to a general result for reward shaping in entropy-regularized RL. We then generalize this approach to derive an exact relation connecting optimal value functions for the composition of multiple tasks in entropy-regularized RL. We validate these theoretical contributions with experiments showing that reward shaping and task composition lead to faster learning in various settings.

* Conference paper accepted in the Technical track for AAAI-2023 
Viaarxiv icon

Multi-Objective Policy Gradients with Topological Constraints

Sep 15, 2022
Kyle Hollins Wray, Stas Tiomkin, Mykel J. Kochenderfer, Pieter Abbeel

Figure 1 for Multi-Objective Policy Gradients with Topological Constraints
Figure 2 for Multi-Objective Policy Gradients with Topological Constraints
Figure 3 for Multi-Objective Policy Gradients with Topological Constraints

Multi-objective optimization models that encode ordered sequential constraints provide a solution to model various challenging problems including encoding preferences, modeling a curriculum, and enforcing measures of safety. A recently developed theory of topological Markov decision processes (TMDPs) captures this range of problems for the case of discrete states and actions. In this work, we extend TMDPs towards continuous spaces and unknown transition dynamics by formulating, proving, and implementing the policy gradient theorem for TMDPs. This theoretical result enables the creation of TMDP learning algorithms that use function approximators, and can generalize existing deep reinforcement learning (DRL) approaches. Specifically, we present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.

Viaarxiv icon

Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning

Jun 07, 2021
Argenis Arriojas, Stas Tiomkin, Rahul V. Kulkarni

Figure 1 for Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning
Figure 2 for Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning
Figure 3 for Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning
Figure 4 for Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning

We introduce a mapping between Maximum Entropy Reinforcement Learning (MaxEnt RL) and Markovian processes conditioned on rare events. In the long time limit, this mapping allows us to derive analytical expressions for the optimal policy, dynamics and initial state distributions for the general case of stochastic dynamics in MaxEnt RL. We find that soft-$\mathcal{Q}$ functions in MaxEnt RL can be obtained from the Perron-Frobenius eigenvalue and the corresponding left eigenvector of a regular, non-negative matrix derived from the underlying Markov Decision Process (MDP). The results derived lead to novel algorithms for model-based and model-free MaxEnt RL, which we validate by numerical simulations. The mapping established in this work opens further avenues for the application of novel analytical and computational approaches to problems in MaxEnt RL. We make our code available at: https://github.com/argearriojas/maxent-rl-mdp-scripts

Viaarxiv icon