Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ayman El-Badawy

Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics

Dec 24, 2025

Karim Abdelsalam, Zeyad Gamal, Ayman El-Badawy

Figure 1 for Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics

Figure 2 for Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics

Figure 3 for Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics

Figure 4 for Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics

Abstract:Controlling systems with complex, nonlinear dynamics poses a significant challenge, particularly in achieving efficient and robust control. In this paper, we propose a Dyna-Style Reinforcement Learning control framework that integrates Sparse Identification of Nonlinear Dynamics (SINDy) with Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning. SINDy is used to identify a data-driven model of the system, capturing its key dynamics without requiring an explicit physical model. This identified model is used to generate synthetic rollouts that are periodically injected into the reinforcement learning replay buffer during training on the real environment, enabling efficient policy learning with limited data available. By leveraging this hybrid approach, we mitigate the sample inefficiency of traditional model-free reinforcement learning methods while ensuring accurate control of nonlinear systems. To demonstrate the effectiveness of this framework, we apply it to a bi-rotor system as a case study, evaluating its performance in stabilization and trajectory tracking. The results show that our SINDy-TD3 approach achieves superior accuracy and robustness compared to direct reinforcement learning techniques, highlighting the potential of combining data-driven modeling with reinforcement learning for complex dynamical systems.

Via

Access Paper or Ask Questions

Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

Dec 20, 2025

Youssef Mahran, Zeyad Gamal, Ayman El-Badawy

Figure 1 for Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

Figure 2 for Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

Figure 3 for Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

Figure 4 for Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

Abstract:This paper explores the impact of dynamic entropy tuning in Reinforcement Learning (RL) algorithms that train a stochastic policy. Its performance is compared against algorithms that train a deterministic one. Stochastic policies optimize a probability distribution over actions to maximize rewards, while deterministic policies select a single deterministic action per state. The effect of training a stochastic policy with both static entropy and dynamic entropy and then executing deterministic actions to control the quadcopter is explored. It is then compared against training a deterministic policy and executing deterministic actions. For the purpose of this research, the Soft Actor-Critic (SAC) algorithm was chosen for the stochastic algorithm while the Twin Delayed Deep Deterministic Policy Gradient (TD3) was chosen for the deterministic algorithm. The training and simulation results show the positive effect the dynamic entropy tuning has on controlling the quadcopter by preventing catastrophic forgetting and improving exploration efficiency.

* 2024 IEEE 34th International Conference on Computer Theory and Applications (ICCTA)
* This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore

Via

Access Paper or Ask Questions

Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

Dec 20, 2025

Youssef Mahran, Zeyad Gamal, Ayman El-Badawy

Figure 1 for Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

Figure 2 for Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

Figure 3 for Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

Figure 4 for Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

Abstract:This paper proposes a new Reinforcement Learning (RL) based control architecture for quadrotors. With the literature focusing on controlling the four rotors' RPMs directly, this paper aims to control the quadrotor's thrust vector. The RL agent computes the percentage of overall thrust along the quadrotor's z-axis along with the desired Roll ($φ$) and Pitch ($θ$) angles. The agent then sends the calculated control signals along with the current quadrotor's Yaw angle ($ψ$) to an attitude PID controller. The PID controller then maps the control signals to motor RPMs. The Soft Actor-Critic algorithm, a model-free off-policy stochastic RL algorithm, was used to train the RL agents. Training results show the faster training time of the proposed thrust vector controller in comparison to the conventional RPM controllers. Simulation results show smoother and more accurate path-following for the proposed thrust vector controller.

* 2024 IEEE 6th Novel Intelligent and Leading Emerging Sciences Conference (NILES)
* This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore

Via

Access Paper or Ask Questions

Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)

Dec 15, 2025

Zeyad Gamal, Youssef Mahran, Ayman El-Badawy

Abstract:This paper proposes a reinforcement learning (RL) framework for controlling and stabilizing the Twin Rotor Aerodynamic System (TRAS) at specific pitch and azimuth angles and tracking a given trajectory. The complex dynamics and non-linear characteristics of the TRAS make it challenging to control using traditional control algorithms. However, recent developments in RL have attracted interest due to their potential applications in the control of multirotors. The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm was used in this paper to train the RL agent. This algorithm is used for environments with continuous state and action spaces, similar to the TRAS, as it does not require a model of the system. The simulation results illustrated the effectiveness of the RL control method. Next, external disturbances in the form of wind disturbances were used to test the controller's effectiveness compared to conventional PID controllers. Lastly, experiments on a laboratory setup were carried out to confirm the controller's effectiveness in real-world applications.

* 2024 28th IEEE International Conference on System Theory, Control and Computing (ICSTCC)
* This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore

Via

Access Paper or Ask Questions