Interpretability in machine learning is critical for the safe deployment of learned policies across legally-regulated and safety-critical domains. While gradient-based approaches in reinforcement learning have achieved tremendous success in learning policies for continuous control problems such as robotics and autonomous driving, the lack of interpretability is a fundamental barrier to adoption. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, reinforcement learning approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning policies that parity or outperform baselines by up to 33% in autonomous driving scenarios while achieving a 300x-600x reduction in the number of parameters against deep learning baselines. We prove that ICCTs can serve as universal function approximators and display analytically that ICCTs can be verified in linear time. Furthermore, we deploy ICCTs in two realistic driving domains, based on interstate Highway-94 and 280 in the US. Finally, we verify ICCT's utility with end-users and find that ICCTs are rated easier to simulate, quicker to validate, and more interpretable than neural networks.
In the domain of autonomous driving, the Learning from Demonstration (LfD) paradigm has exhibited notable efficacy in addressing sequential decision-making problems. However, consistently achieving safety in varying traffic contexts, especially in safety-critical scenarios, poses a significant challenge due to the long-tailed and unforeseen scenarios absent from offline datasets. In this paper, we introduce the saFety-aware strUctured Scenario representatION (FUSION), a pioneering methodology conceived to facilitate the learning of an adaptive end-to-end driving policy by leveraging structured scenario information. FUSION capitalizes on the causal relationships between decomposed reward, cost, state, and action space, constructing a framework for structured sequential reasoning under dynamic traffic environments. We conduct rigorous evaluations in two typical real-world settings of distribution shift in autonomous vehicles, demonstrating the good balance between safety cost and utility reward of FUSION compared to contemporary state-of-the-art safety-aware LfD baselines. Empirical evidence under diverse driving scenarios attests that FUSION significantly enhances the safety and generalizability of autonomous driving agents, even in the face of challenging and unseen environments. Furthermore, our ablation studies reveal noticeable improvements in the integration of causal representation into the safe offline RL problem.
Tool use is a hallmark of advanced intelligence, exemplified in both animal behavior and robotic capabilities. This paper investigates the feasibility of imbuing robots with the ability to creatively use tools in tasks that involve implicit physical constraints and long-term planning. Leveraging Large Language Models (LLMs), we develop RoboTool, a system that accepts natural language instructions and outputs executable code for controlling robots in both simulated and real-world environments. RoboTool incorporates four pivotal components: (i) an "Analyzer" that interprets natural language to discern key task-related concepts, (ii) a "Planner" that generates comprehensive strategies based on the language input and key concepts, (iii) a "Calculator" that computes parameters for each skill, and (iv) a "Coder" that translates these plans into executable Python code. Our results show that RoboTool can not only comprehend explicit or implicit physical constraints and environmental factors but also demonstrate creative tool use. Unlike traditional Task and Motion Planning (TAMP) methods that rely on explicit optimization, our LLM-based system offers a more flexible, efficient, and user-friendly solution for complex robotics tasks. Through extensive experiments, we validate that RoboTool is proficient in handling tasks that would otherwise be infeasible without the creative use of tools, thereby expanding the capabilities of robotic systems. Demos are available on our project page: https://creative-robotool.github.io/.
Snake robots have showcased remarkable compliance and adaptability in their interaction with environments, mirroring the traits of their natural counterparts. While their hyper-redundant and high-dimensional characteristics add to this adaptability, they also pose great challenges to robot control. Instead of perceiving the hyper-redundancy and flexibility of snake robots as mere challenges, there lies an unexplored potential in leveraging these traits to enhance robustness and generalizability at the control policy level. We seek to develop a control policy that effectively breaks down the high dimensionality of snake robots while harnessing their redundancy. In this work, we consider the snake robot as a modular robot and formulate the control of the snake robot as a cooperative Multi-Agent Reinforcement Learning (MARL) problem. Each segment of the snake robot functions as an individual agent. Specifically, we incorporate a self-attention mechanism to enhance the cooperative behavior between agents. A high-level imagination policy is proposed to provide additional rewards to guide the low-level control policy. We validate the proposed method COMPOSER with five snake robot tasks, including goal reaching, wall climbing, shape formation, tube crossing, and block pushing. COMPOSER achieves the highest success rate across all tasks when compared to a centralized baseline and four modular policy baselines. Additionally, we show enhanced robustness against module corruption and significantly superior zero-shot generalizability in our proposed method. The videos of this work are available on our project page: https://sites.google.com/view/composer-snake/.
Accurate depth estimation under out-of-distribution (OoD) scenarios, such as adverse weather conditions, sensor failure, and noise contamination, is desirable for safety-critical applications. Existing depth estimation systems, however, suffer inevitably from real-world corruptions and perturbations and are struggled to provide reliable depth predictions under such cases. In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation. This challenge was developed based on the newly established KITTI-C and NYUDepth2-C benchmarks. We hosted two stand-alone tracks, with an emphasis on robust self-supervised and robust fully-supervised depth estimation, respectively. Out of more than two hundred participants, nine unique and top-performing solutions have appeared, with novel designs ranging from the following aspects: spatial- and frequency-domain augmentations, masked image modeling, image restoration and super-resolution, adversarial training, diffusion-based noise suppression, vision-language pre-training, learned model ensembling, and hierarchical feature enhancement. Extensive experimental analyses along with insightful observations are drawn to better understand the rationale behind each design. We hope this challenge could lay a solid foundation for future research on robust and reliable depth estimation and beyond. The datasets, competition toolkit, workshop recordings, and source code from the winning teams are publicly available on the challenge website.
In this work, we first formulate the problem of goal-conditioned robotic water scooping with reinforcement learning. This task is challenging due to the complex dynamics of fluid and multi-modal goal-reaching. The policy is required to achieve both position goals and water amount goals, which leads to a large convoluted goal state space. To address these challenges, we introduce Goal Sampling Adaptation for Scooping (GOATS), a curriculum reinforcement learning method that can learn an effective and generalizable policy for robot scooping tasks. Specifically, we use a goal-factorized reward formulation and interpolate position goal distributions and amount goal distributions to create curriculum through the learning process. As a result, our proposed method can outperform the baselines in simulation and achieves 5.46% and 8.71% amount errors on bowl scooping and bucket scooping tasks, respectively, under 1000 variations of initial water states in the tank and a large goal state space. Besides being effective in simulation environments, our method can efficiently generalize to noisy real-robot water-scooping scenarios with different physical configurations and unseen settings, demonstrating superior efficacy and generalizability. The videos of this work are available on our project page: https://sites.google.com/view/goatscooping.
Motion forecasting in highly interactive scenarios is a challenging problem in autonomous driving. In such scenarios, we need to accurately predict the joint behavior of interacting agents to ensure the safe and efficient navigation of autonomous vehicles. Recently, goal-conditioned methods have gained increasing attention due to their advantage in performance and their ability to capture the multimodality in trajectory distribution. In this work, we study the joint trajectory prediction problem with the goal-conditioned framework. In particular, we introduce a conditional-variational-autoencoder-based (CVAE) model to explicitly encode different interaction modes into the latent space. However, we discover that the vanilla model suffers from posterior collapse and cannot induce an informative latent space as desired. To address these issues, we propose a novel approach to avoid KL vanishing and induce an interpretable interactive latent space with pseudo labels. The pseudo labels allow us to incorporate arbitrary domain knowledge on interaction. We motivate the proposed method using an illustrative toy example. In addition, we validate our framework on the Waymo Open Motion Dataset with both quantitative and qualitative evaluations.
Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for continuous control problems. While the performance of these approaches warrants real-world adoption in domains, such as in autonomous driving and robotics, these policies lack interpretability, limiting deployability in safety-critical and legally-regulated domains. Such domains require interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33$\%$ in autonomous driving scenarios while achieving a $300$x-$600$x reduction in the number of policy parameters against deep learning baselines.
Humans can leverage hierarchical structures to split a task into sub-tasks and solve problems efficiently. Both imitation and reinforcement learning or a combination of them with hierarchical structures have been proven to be an efficient way for robots to learn complex tasks with sparse rewards. However, in the previous work of hierarchical imitation and reinforcement learning, the tested environments are in relatively simple 2D games, and the action spaces are discrete. Furthermore, many imitation learning works focusing on improving the policies learned from the expert polices that are hard-coded or trained by reinforcement learning algorithms, rather than human experts. In the scenarios of human-robot interaction, humans can be required to provide demonstrations to teach the robot, so it is crucial to improve the learning efficiency to reduce expert efforts, and know human's perception about the learning/training process. In this project, we explored different imitation learning algorithms and designed active learning algorithms upon the hierarchical imitation and reinforcement learning framework we have developed. We performed an experiment where five participants were asked to guide a randomly initialized agent to a random goal in a maze. Our experimental results showed that using DAgger and reward-based active learning method can achieve better performance while saving more human efforts physically and mentally during the training process.
The gesture-determined-dynamic function (GDDF) offers an effective way to handle the control problems of humanoid robots. Specifically, GDDF is utilized to constrain the movements of dual arms of humanoid robots and steer specific gestures to conduct demanding tasks under certain conditions. However, there is still a deficiency in this scheme. Through experiments, we found that the joints of the dual arms, which can be regarded as the redundant manipulators, could exceed their limits slightly at the joint angle level. The performance straightly depends on the parameters designed beforehand for the GDDF, which causes a lack of adaptability to the practical applications of this method. In this paper, a modified scheme of GDDF with consideration of margins (MGDDF) is proposed. This MGDDF scheme is based on quadratic programming (QP) framework, which is widely applied to solving the redundancy resolution problems of robot arms. Moreover, three margins are introduced in the proposed MGDDF scheme to avoid joint limits. With consideration of these margins, the joints of manipulators of the humanoid robots will not exceed their limits, and the potential damages which might be caused by exceeding limits will be completely avoided. Computer simulations conducted on MATLAB further verify the feasibility and superiority of the proposed MGDDF scheme.