Abstract:Ensuring safe exploration in high-dimensional systems with unknown dynamics remains a significant challenge. Existing safe reinforcement learning methods often provide safety guarantees only in expectation, which can still lead to safety violations. Control-theoretic approaches, in contrast, offer hard constraint-based safety guarantees but typically assume access to known system dynamics or require accurate estimation of control-affine models. In this paper, we propose a safe reinforcement learning framework that learns a probabilistic control-affine dynamics model in an offline setting. The learned model is leveraged to explicitly construct control barrier functions (CBFs) that incorporate model uncertainty to provide conservative safety constraints. These CBF constraints are enforced through an online constraint-based action correction mechanism, enabling safe exploration without overly restricting task performance. Empirical evaluations on nonlinear, complex continuous-control benchmarks demonstrate that our approach achieves returns comparable to those of existing baselines while significantly reducing safety violations.




Abstract:Drones are becoming versatile in a myriad of applications. This has led to the use of drones for spying and intruding into the restricted or private air spaces. Such foul use of drone technology is dangerous for the safety and security of many critical infrastructures. In addition, due to the varied low-cost design and agility of the drones, it is a challenging task to identify and track them using the conventional radar systems. In this paper, we propose a reinforcement learning based approach for identifying and tracking any intruder drone using a chaser drone. Our proposed solution uses computer vision techniques interleaved with the policy learning framework of reinforcement learning to learn a control policy for chasing the intruder drone. The whole system has been implemented using ROS and Gazebo along with the Ardupilot based flight controller. The results show that the reinforcement learning based policy converges to identify and track the intruder drone. Further, the learnt policy is robust with respect to the change in speed or orientation of the intruder drone.