Safety is a fundamental requirement of control systems. Control Barrier Functions (CBFs) are proposed to ensure the safety of the control system by constructing safety filters or synthesizing control inputs. However, the safety guarantee and performance of safe controllers rely on the construction of valid CBFs. Inspired by universal approximatability, CBFs are represented by neural networks, known as neural CBFs (NCBFs). This paper presents an algorithm for synthesizing formally verified continuous-time neural Control Barrier Functions in stochastic environments in a single step. The proposed training process ensures efficacy across the entire state space with only a finite number of data points by constructing a sample-based learning framework for Stochastic Neural CBFs (SNCBFs). Our methodology eliminates the need for post hoc verification by enforcing Lipschitz bounds on the neural network, its Jacobian, and Hessian terms. We demonstrate the effectiveness of our approach through case studies on the inverted pendulum system and obstacle avoidance in autonomous driving, showcasing larger safe regions compared to baseline methods.
This work presents a unified approach for collision avoidance using Collision-Cone Control Barrier Functions (CBFs) in both ground (UGV) and aerial (UAV) unmanned vehicles. We propose a novel CBF formulation inspired by collision cones, to ensure safety by constraining the relative velocity between the vehicle and the obstacle to always point away from each other. The efficacy of this approach is demonstrated through simulations and hardware implementations on the TurtleBot, Stoch-Jeep, and Crazyflie 2.1 quadrotor robot, showcasing its effectiveness in avoiding collisions with dynamic obstacles in both ground and aerial settings. The real-time controller is developed using CBF Quadratic Programs (CBF-QPs). Comparative analysis with the state-of-the-art CBFs highlights the less conservative nature of the proposed approach. Overall, this research contributes to a novel control formation that can give a guarantee for collision avoidance in unmanned vehicles by modifying the control inputs from existing path-planning controllers.
Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. While RL excels in these tasks, training time remains a limitation. Reward shaping is a popular solution, but existing methods often rely on value functions, which face scalability issues. This paper presents a novel safety-oriented reward-shaping framework inspired by barrier functions, offering simplicity and ease of implementation across various environments and tasks. To evaluate the effectiveness of the proposed reward formulations, we conduct simulation experiments on CartPole, Ant, and Humanoid environments, along with real-world deployment on the Unitree Go1 quadruped robot. Our results demonstrate that our method leads to 1.4-2.8 times faster convergence and as low as 50-60% actuation effort compared to the vanilla reward. In a sim-to-real experiment with the Go1 robot, we demonstrated better control and dynamics of the bot with our reward framework.
This paper introduces the Stoch BiRo, a cost-effective bipedal robot designed with a modular mechanical structure having point feet to navigate uneven and unfamiliar terrains. The robot employs proprioceptive actuation in abduction, hips, and knees, leveraging a Raspberry Pi4 for control. Overcoming computational limitations, a Learning-based Linear Policy controller manages balance and locomotion with only 3 degrees of freedom (DoF) per leg, distinct from the typical 5DoF in bipedal systems. Integrated within a modular control architecture, these controllers enable autonomous handling of unforeseen terrain disturbances without external sensors or prior environment knowledge. The robot's policies are trained and simulated using MuJoCo, transferring learned behaviors to the Stoch BiRo hardware for initial walking validations. This work highlights the Stoch BiRo's adaptability and cost-effectiveness in mechanical design, control strategies, and autonomous navigation, promising diverse applications in real-world robotics scenarios.
In fields such as mining, search and rescue, and archaeological exploration, ensuring real-time, collision-free navigation of robots in confined, cluttered environments is imperative. Despite the value of established path planning algorithms, they often face challenges in convergence rates and handling dynamic infeasibilities. Alternative techniques like collision cones struggle to accurately represent complex obstacle geometries. This paper introduces a novel category of control barrier functions, known as Polygonal Cone Control Barrier Function (PolyC2BF), which addresses overestimation and computational complexity issues. The proposed PolyC2BF, formulated as a Quadratic Programming (QP) problem, proves effective in facilitating collision-free movement of multiple robots in complex environments. The efficacy of this approach is further demonstrated through PyBullet simulations on quadruped (unicycle model), and crazyflie 2.1 (quadrotor model) in cluttered environments.
Autonomy advances have enabled robots in diverse environments and close human interaction, necessitating controllers with formal safety guarantees. This paper introduces an experimental platform designed for the validation and demonstration of a novel class of Control Barrier Functions (CBFs) tailored for Unmanned Ground Vehicles (UGVs) to proactively prevent collisions with kinematic obstacles by integrating the concept of collision cones. While existing CBF formulations excel with static obstacles, extensions to torque/acceleration-controlled unicycle and bicycle models have seen limited success. Conventional CBF applications in nonholonomic UGV models have demonstrated control conservatism, particularly in scenarios where steering/thrust control was deemed infeasible. Drawing inspiration from collision cones in path planning, we present a pioneering CBF formulation ensuring theoretical safety guarantees for both unicycle and bicycle models. The core premise revolves around aligning the obstacle's velocity away from the vehicle, establishing a constraint to perpetually avoid vectors directed towards it. This control methodology is rigorously validated through simulations and experimental verification on the Copernicus mobile robot (Unicycle Model) and FOCAS-Car (Bicycle Model).
Legged robots exhibit significant potential across diverse applications, including but not limited to hazardous environment search and rescue missions and the exploration of unexplored regions both on Earth and in outer space. However, the successful navigation of these robots in dynamic environments heavily hinges on the implementation of efficient collision avoidance techniques. In this research paper, we employ Collision Cone Control Barrier Functions (C3BF) to ensure the secure movement of legged robots within environments featuring a wide array of static and dynamic obstacles. We introduce the Quadratic Program (QP) formulation of C3BF, referred to as C3BF-QP, which serves as a protective filter layer atop a reference controller to ensure the robots' safety during operation. The effectiveness of this approach is illustrated through simulations conducted on PyBullet.
The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an $\epsilon$-optimal stationary policy with a sample complexity of $\Omega(\epsilon^{-2.5})$. We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments.
Unmanned aerial vehicles (UAVs), specifically quadrotors, have revolutionized various industries with their maneuverability and versatility, but their safe operation in dynamic environments heavily relies on effective collision avoidance techniques. This paper introduces a novel technique for safely navigating a quadrotor along a desired route while avoiding kinematic obstacles. The proposed approach employs control barrier functions and utilizes collision cones to ensure that the quadrotor's velocity and the obstacle's velocity always point away from each other. In particular, we propose a new constraint formulation that ensures that the relative velocity between the quadrotor and the obstacle always avoids a cone of vectors that may lead to a collision. By showing that the proposed constraint is a valid control barrier function (CBFs) for quadrotors, we are able to leverage on its real-time implementation via Quadratic Programs (QPs), called the CBF-QPs. We validate the effectiveness of the proposed CBF-QPs by demonstrating collision avoidance with moving obstacles under multiple scenarios. This is shown in the pybullet simulator.Furthermore we compare the proposed approach with CBF-QPs shown in literature, especially the well-known higher order CBF-QPs (HO-CBF-QPs), where in we show that it is more conservative compared to the proposed approach. This comparison also shown in simulation in detail.
Evolution Strategy (ES) is a powerful black-box optimization technique based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. For an ES method in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches: each policy's score is estimated by interacting several times with the environment using that policy. This leads to a lot of wasteful interactions since, once the ranking is done, only the data associated with the top-ranked policies is used for subsequent learning. To improve sample efficiency, we propose a novel off-policy alternative for ranking, based on a local approximation for the fitness function. We demonstrate our idea in the context of a state-of-the-art ES method called the Augmented Random Search (ARS). Simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70% as much data. It also outperforms the recent Trust Region ES. We believe our ideas should be extendable to other ES methods as well.