Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dmitrii Dobriborsci

Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Sep 15, 2024

Pavel Osinenko, Grigory Yaremenko, Roman Zashchitin, Anton Bolychev, Sinan Ibrahim, Dmitrii Dobriborsci

Figure 1 for Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Figure 2 for Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Figure 3 for Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Abstract:This work presents and showcases a novel reinforcement learning agent called Critic As Lyapunov Function (CALF) which is model-free and ensures online environment, in other words, dynamical system stabilization. Online means that in each learning episode, the said environment is stabilized. This, as demonstrated in a case study with a mobile robot simulator, greatly improves the overall learning performance. The base actor-critic scheme of CALF is analogous to SARSA. The latter did not show any success in reaching the target in our studies. However, a modified version thereof, called SARSA-m here, did succeed in some learning scenarios. Still, CALF greatly outperformed the said approach. CALF was also demonstrated to improve a nominal stabilizer provided to it. In summary, the presented agent may be considered a viable approach to fusing classical control with reinforcement learning. Its concurrent approaches are mostly either offline or model-based, like, for instance, those that fuse model-predictive control into the agent.

* IEEE Conference on Decision and Control. Accepted for publication in proceedings of the conference

Via

Access Paper or Ask Questions

A generalized stacked reinforcement learning method for sampled systems

Aug 23, 2021

Pavel Osinenko, Dmitrii Dobriborsci

Figure 1 for A generalized stacked reinforcement learning method for sampled systems

Figure 2 for A generalized stacked reinforcement learning method for sampled systems

Figure 3 for A generalized stacked reinforcement learning method for sampled systems

Figure 4 for A generalized stacked reinforcement learning method for sampled systems

Abstract:A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the environment is a stochastic discrete-time dynamical system. Whereas MDPs are suitable in such applications as video-games or puzzles, physical systems are time-continuous. Continuous methods of RL are known, but they have their limitations, such as, e.g., collapse of Q-learning. A general variant of RL is of digital format, where updates of the value and policy are performed at discrete moments in time. The agent-environment loop then amounts to a sampled system, whereby sample-and-hold is a specific case. In this paper, we propose and benchmark two RL methods suitable for sampled systems. Specifically, we hybridize model-predictive control (MPC) with critics learning the Q- and value function. Optimality is analyzed and performance comparison is done in an experimental case study with a mobile robot.

Via

Access Paper or Ask Questions

An experimental study of two predictive reinforcement learning methods and comparison with model-predictive control

Aug 23, 2021

Dmitrii Dobriborsci, Pavel Osinenko

Figure 1 for An experimental study of two predictive reinforcement learning methods and comparison with model-predictive control

Figure 2 for An experimental study of two predictive reinforcement learning methods and comparison with model-predictive control

Figure 3 for An experimental study of two predictive reinforcement learning methods and comparison with model-predictive control

Figure 4 for An experimental study of two predictive reinforcement learning methods and comparison with model-predictive control

Abstract:Reinforcement learning (RL) has been successfully used in various simulations and computer games. Industry-related applications, such as autonomous mobile robot motion control, are somewhat challenging for RL up to date though. This paper presents an experimental evaluation of predictive RL controllers for optimal mobile robot motion control. As a baseline for comparison, model-predictive control (MPC) is used. Two RL methods are tested: a roll-out Q-learning, which may be considered as MPC with terminal cost being a Q-function approximation, and a so-called stacked Q-learning, which in turn is like MPC with the running cost substituted for a Q-function approximation. The experimental foundation is a mobile robot with a differential drive (Robotis Turtlebot3). Experimental results showed that both RL methods beat the baseline in terms of the accumulated cost, whereas the stacked variant performed best. Provided the series of previous works on stacked Q-learning, this particular study supports the idea that MPC with a running cost adaptation inspired by Q-learning possesses potential of performance boost while retaining the nice properties of MPC.

Via

Access Paper or Ask Questions