Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sicun Gao

Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey

Jul 12, 2024

Milan Ganai, Sicun Gao, Sylvia Herbert

Figure 1 for Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey

Figure 2 for Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey

Figure 3 for Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey

Figure 4 for Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey

Abstract:Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was limited to verifying low-dimensional dynamical systems -- this is because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. To address this limitation, in recent years, there have been methods that compute the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.

Via

Access Paper or Ask Questions

Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Jun 26, 2024

Chung-En Sun, Sicun Gao, Tsui-Wei Weng

Figure 1 for Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Figure 2 for Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Figure 3 for Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Figure 4 for Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Abstract:Robustness remains a paramount concern in deep reinforcement learning (DRL), with randomized smoothing emerging as a key technique for enhancing this attribute. However, a notable gap exists in the performance of current smoothed DRL agents, often characterized by significantly low clean rewards and weak robustness. In response to this challenge, our study introduces innovative algorithms aimed at training effective smoothed robust DRL agents. We propose S-DQN and S-PPO, novel approaches that demonstrate remarkable improvements in clean rewards, empirical robustness, and robustness guarantee across standard RL benchmarks. Notably, our S-DQN and S-PPO agents not only significantly outperform existing smoothed agents by an average factor of $2.16\times$ under the strongest attack, but also surpass previous robustly-trained agents by an average factor of $2.13\times$. This represents a significant leap forward in the field. Furthermore, we introduce Smoothed Attack, which is $1.89\times$ more effective in decreasing the rewards of smoothed agents than existing adversarial attacks.

* Published in ICML 2024

Via

Access Paper or Ask Questions

Activation-Descent Regularization for Input Optimization of ReLU Networks

Jun 01, 2024

Hongzhan Yu, Sicun Gao

Figure 1 for Activation-Descent Regularization for Input Optimization of ReLU Networks

Figure 2 for Activation-Descent Regularization for Input Optimization of ReLU Networks

Figure 3 for Activation-Descent Regularization for Input Optimization of ReLU Networks

Figure 4 for Activation-Descent Regularization for Input Optimization of ReLU Networks

Abstract:We present a new approach for input optimization of ReLU networks that explicitly takes into account the effect of changes in activation patterns. We analyze local optimization steps in both the input space and the space of activation patterns to propose methods with superior local descent properties. To accomplish this, we convert the discrete space of activation patterns into differentiable representations and propose regularization terms that improve each descent step. Our experiments demonstrate the effectiveness of the proposed input-optimization methods for improving the state-of-the-art in various areas, such as adversarial learning, generative modeling, and reinforcement learning.

* ICML'24 Proceedings

Via

Access Paper or Ask Questions

Mollification Effects of Policy Gradient Methods

May 28, 2024

Tao Wang, Sylvia Herbert, Sicun Gao

Abstract:Policy gradient methods have enabled deep reinforcement learning (RL) to approach challenging continuous control problems, even when the underlying systems involve highly nonlinear dynamics that generate complex non-smooth optimization landscapes. We develop a rigorous framework for understanding how policy gradient methods mollify non-smooth optimization landscapes to enable effective policy search, as well as the downside of it: while making the objective function smoother and easier to optimize, the stochastic objective deviates further from the original problem. We demonstrate the equivalence between policy gradient methods and solving backward heat equations. Following the ill-posedness of backward heat equations from PDE theory, we present a fundamental challenge to the use of policy gradient under stochasticity. Moreover, we make the connection between this limitation and the uncertainty principle in harmonic analysis to understand the effects of exploration with stochastic policies in RL. We also provide experimental results to illustrate both the positive and negative aspects of mollification effects in practice.

* 19 pages, 41 figures

Via

Access Paper or Ask Questions

Understanding the Difficulty of Solving Cauchy Problems with PINNs

May 04, 2024

Tao Wang, Bo Zhao, Sicun Gao, Rose Yu

Figure 1 for Understanding the Difficulty of Solving Cauchy Problems with PINNs

Figure 2 for Understanding the Difficulty of Solving Cauchy Problems with PINNs

Figure 3 for Understanding the Difficulty of Solving Cauchy Problems with PINNs

Figure 4 for Understanding the Difficulty of Solving Cauchy Problems with PINNs

Abstract:Physics-Informed Neural Networks (PINNs) have gained popularity in scientific computing in recent years. However, they often fail to achieve the same level of accuracy as classical methods in solving differential equations. In this paper, we identify two sources of this issue in the case of Cauchy problems: the use of $L^2$ residuals as objective functions and the approximation gap of neural networks. We show that minimizing the sum of $L^2$ residual and initial condition error is not sufficient to guarantee the true solution, as this loss function does not capture the underlying dynamics. Additionally, neural networks are not capable of capturing singularities in the solutions due to the non-compactness of their image sets. This, in turn, influences the existence of global minima and the regularity of the network. We demonstrate that when the global minimum does not exist, machine precision becomes the predominant source of achievable error in practice. We also present numerical experiments in support of our theoretical claims.

* 13 pages and 18 figures

Via

Access Paper or Ask Questions

Extremum-Seeking Action Selection for Accelerating Policy Optimization

Apr 02, 2024

Ya-Chien Chang, Sicun Gao

Abstract:Reinforcement learning for control over continuous spaces typically uses high-entropy stochastic policies, such as Gaussian distributions, for local exploration and estimating policy gradient to optimize performance. Many robotic control problems deal with complex unstable dynamics, where applying actions that are off the feasible control manifolds can quickly lead to undesirable divergence. In such cases, most samples taken from the ambient action space generate low-value trajectories that hardly contribute to policy improvement, resulting in slow or failed learning. We propose to improve action selection in this model-free RL setting by introducing additional adaptive control steps based on Extremum-Seeking Control (ESC). On each action sampled from stochastic policies, we apply sinusoidal perturbations and query for estimated Q-values as the response signal. Based on ESC, we then dynamically improve the sampled actions to be closer to nearby optima before applying them to the environment. Our methods can be easily added in standard policy optimization to improve learning efficiency, which we demonstrate in various control learning environments.

Via

Access Paper or Ask Questions

Efficient Motion Planning for Manipulators with Control Barrier Function-Induced Neural Controller

Apr 01, 2024

Mingxin Yu, Chenning Yu, M-Mahdi Naddaf-Sh, Devesh Upadhyay, Sicun Gao, Chuchu Fan

Figure 1 for Efficient Motion Planning for Manipulators with Control Barrier Function-Induced Neural Controller

Figure 2 for Efficient Motion Planning for Manipulators with Control Barrier Function-Induced Neural Controller

Figure 3 for Efficient Motion Planning for Manipulators with Control Barrier Function-Induced Neural Controller

Figure 4 for Efficient Motion Planning for Manipulators with Control Barrier Function-Induced Neural Controller

Abstract:Sampling-based motion planning methods for manipulators in crowded environments often suffer from expensive collision checking and high sampling complexity, which make them difficult to use in real time. To address this issue, we propose a new generalizable control barrier function (CBF)-based steering controller to reduce the number of samples needed in a sampling-based motion planner RRT. Our method combines the strength of CBF for real-time collision-avoidance control and RRT for long-horizon motion planning, by using CBF-induced neural controller (CBF-INC) to generate control signals that steer the system towards sampled configurations by RRT. CBF-INC is learned as Neural Networks and has two variants handling different inputs, respectively: state (signed distance) input and point-cloud input from LiDAR. In the latter case, we also study two different settings: fully and partially observed environmental information. Compared to manually crafted CBF which suffers from over-approximating robot geometry, CBF-INC can balance safety and goal-reaching better without being over-conservative. Given state-based input, our neural CBF-induced neural controller-enhanced RRT (CBF-INC-RRT) can increase the success rate by 14% while reducing the number of nodes explored by 30%, compared with vanilla RRT on hard test cases. Given LiDAR input where vanilla RRT is not directly applicable, we demonstrate that our CBF-INC-RRT can improve the success rate by 10%, compared with planning with other steering controllers. Our project page with supplementary material is at https://mit-realm.github.io/CBF-INC-RRT-website/.

* Accepted by IEEE International Conference on Robotics and Automation (ICRA2024)

Via

Access Paper or Ask Questions

Sample-and-Bound for Non-Convex Optimization

Jan 13, 2024

Yaoguang Zhai, Zhizhen Qin, Sicun Gao

Figure 1 for Sample-and-Bound for Non-Convex Optimization

Figure 2 for Sample-and-Bound for Non-Convex Optimization

Figure 3 for Sample-and-Bound for Non-Convex Optimization

Figure 4 for Sample-and-Bound for Non-Convex Optimization

Abstract:Standard approaches for global optimization of non-convex functions, such as branch-and-bound, maintain partition trees to systematically prune the domain. The tree size grows exponentially in the number of dimensions. We propose new sampling-based methods for non-convex optimization that adapts Monte Carlo Tree Search (MCTS) to improve efficiency. Instead of the standard use of visitation count in Upper Confidence Bounds, we utilize numerical overapproximations of the objective as an uncertainty metric, and also take into account of sampled estimates of first-order and second-order information. The Monte Carlo tree in our approach avoids the usual fixed combinatorial patterns in growing the tree, and aggressively zooms into the promising regions, while still balancing exploration and exploitation. We evaluate the proposed algorithms on high-dimensional non-convex optimization benchmarks against competitive baselines and analyze the effects of the hyper parameters.

* Published at AAAI 2024. Code is available at https://github.com/aaucsd/MCIR

Via

Access Paper or Ask Questions

Fractal Landscapes in Policy Optimization

Oct 24, 2023

Tao Wang, Sylvia Herbert, Sicun Gao

Figure 1 for Fractal Landscapes in Policy Optimization

Figure 2 for Fractal Landscapes in Policy Optimization

Figure 3 for Fractal Landscapes in Policy Optimization

Figure 4 for Fractal Landscapes in Policy Optimization

Abstract:Policy gradient lies at the core of deep reinforcement learning (RL) in continuous domains. Despite much success, it is often observed in practice that RL training with policy gradient can fail for many reasons, even on standard control problems with known solutions. We propose a framework for understanding one inherent limitation of the policy gradient approach: the optimization landscape in the policy space can be extremely non-smooth or fractal for certain classes of MDPs, such that there does not exist gradient to be estimated in the first place. We draw on techniques from chaos theory and non-smooth analysis, and analyze the maximal Lyapunov exponents and H\"older exponents of the policy optimization objectives. Moreover, we develop a practical method that can estimate the local smoothness of objective function from samples to identify when the training process has encountered fractal landscapes. We show experiments to illustrate how some failure cases of policy optimization can be explained by such fractal landscapes.

* 18 pages and 28 figures

Via

Access Paper or Ask Questions

Iterative Reachability Estimation for Safe Reinforcement Learning

Sep 24, 2023

Milan Ganai, Zheng Gong, Chenning Yu, Sylvia Herbert, Sicun Gao

Figure 1 for Iterative Reachability Estimation for Safe Reinforcement Learning

Figure 2 for Iterative Reachability Estimation for Safe Reinforcement Learning

Figure 3 for Iterative Reachability Estimation for Safe Reinforcement Learning

Figure 4 for Iterative Reachability Estimation for Safe Reinforcement Learning

Abstract:Ensuring safety is important for the practical deployment of reinforcement learning (RL). Various challenges must be addressed, such as handling stochasticity in the environments, providing rigorous guarantees of persistent state-wise safety satisfaction, and avoiding overly conservative behaviors that sacrifice performance. We propose a new framework, Reachability Estimation for Safe Policy Optimization (RESPO), for safety-constrained RL in general stochastic settings. In the feasible set where there exist violation-free policies, we optimize for rewards while maintaining persistent safety. Outside this feasible set, our optimization produces the safest behavior by guaranteeing entrance into the feasible set whenever possible with the least cumulative discounted violations. We introduce a class of algorithms using our novel reachability estimation function to optimize in our proposed framework and in similar frameworks such as those concurrently handling multiple hard and soft constraints. We theoretically establish that our algorithms almost surely converge to locally optimal policies of our safe optimization framework. We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, PyBullet, and MuJoCo, and show the benefits in improving both reward performance and safety compared with state-of-the-art baselines.

* Accepted in NeurIPS 2023

Via

Access Paper or Ask Questions