Alert button
Picture for Yisong Yue

Yisong Yue

Alert button

Joint-Space Multi-Robot Motion Planning with Learned Decentralized Heuristics

Nov 21, 2023
Fengze Xie, Marcus Dominguez-Kuhne, Benjamin Riviere, Jialin Song, Wolfgang Hönig, Soon-Jo Chung, Yisong Yue

In this paper, we present a method of multi-robot motion planning by biasing centralized, sampling-based tree search with decentralized, data-driven steer and distance heuristics. Over a range of robot and obstacle densities, we evaluate the plain Rapidly-expanding Random Trees (RRT), and variants of our method for double integrator dynamics. We show that whereas plain RRT fails in every instance to plan for $4$ robots, our method can plan for up to 16 robots, corresponding to searching through a very large 65-dimensional space, which validates the effectiveness of data-driven heuristics at combating exponential search space growth. We also find that the heuristic information is complementary; using both heuristics produces search trees with lower failure rates, nodes, and path costs when compared to using each in isolation. These results illustrate the effective decomposition of high-dimensional joint-space motion planning problems into local problems.

Viaarxiv icon

Hierarchical Meta-learning-based Adaptive Controller

Nov 21, 2023
Fengze Xie, Guanya Shi, Michael O'Connell, Yisong Yue, Soon-Jo Chung

We study how to design learning-based adaptive controllers that enable fast and accurate online adaptation in changing environments. In these settings, learning is typically done during an initial (offline) design phase, where the vehicle is exposed to different environmental conditions and disturbances (e.g., a drone exposed to different winds) to collect training data. Our work is motivated by the observation that real-world disturbances fall into two categories: 1) those that can be directly monitored or controlled during training, which we call "manageable", and 2) those that cannot be directly measured or controlled (e.g., nominal model mismatch, air plate effects, and unpredictable wind), which we call "latent". Imprecise modeling of these effects can result in degraded control performance, particularly when latent disturbances continuously vary. This paper presents the Hierarchical Meta-learning-based Adaptive Controller (HMAC) to learn and adapt to such multi-source disturbances. Within HMAC, we develop two techniques: 1) Hierarchical Iterative Learning, which jointly trains representations to caption the various sources of disturbances, and 2) Smoothed Streaming Meta-Learning, which learns to capture the evolving structure of latent disturbances over time (in addition to standard meta-learning on the manageable disturbances). Experimental results demonstrate that HMAC exhibits more precise and rapid adaptation to multi-source disturbances than other adaptive controllers.

* Submitted to ICRA 2024 
Viaarxiv icon

Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation

Jul 25, 2023
Fengxue Zhang, Jialin Song, James Bowden, Alexander Ladd, Yisong Yue, Thomas A. Desautels, Yuxin Chen

Figure 1 for Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation
Figure 2 for Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation
Figure 3 for Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation
Figure 4 for Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation

We study Bayesian optimization (BO) in high-dimensional and non-stationary scenarios. Existing algorithms for such scenarios typically require extensive hyperparameter tuning, which limits their practical effectiveness. We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest (ROI) as a superlevel-set of a nonparametric probabilistic model such as a Gaussian process (GP). Our approach is easy to tune, and is able to focus on local region of the optimization space that can be tackled by existing BO methods. The key idea is to use two probabilistic models: a coarse GP to identify the ROI, and a localized GP for optimization within the ROI. We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO without ROI filtering. We demonstrate empirically the effectiveness of BALLET on both synthetic and real-world optimization tasks.

Viaarxiv icon

Automatic Gradient Descent: Deep Learning without Hyperparameters

Apr 11, 2023
Jeremy Bernstein, Chris Mingard, Kevin Huang, Navid Azizan, Yisong Yue

Figure 1 for Automatic Gradient Descent: Deep Learning without Hyperparameters
Figure 2 for Automatic Gradient Descent: Deep Learning without Hyperparameters
Figure 3 for Automatic Gradient Descent: Deep Learning without Hyperparameters
Figure 4 for Automatic Gradient Descent: Deep Learning without Hyperparameters

The architecture of a deep neural network is defined explicitly in terms of the number of layers, the width of each layer and the general network topology. Existing optimisation frameworks neglect this information in favour of implicit architectural information (e.g. second-order methods) or architecture-agnostic distance functions (e.g. mirror descent). Meanwhile, the most popular optimiser in practice, Adam, is based on heuristics. This paper builds a new framework for deriving optimisation algorithms that explicitly leverage neural architecture. The theory extends mirror descent to non-convex composite objective functions: the idea is to transform a Bregman divergence to account for the non-linear structure of neural architecture. Working through the details for deep fully-connected networks yields automatic gradient descent: a first-order optimiser without any hyperparameters. Automatic gradient descent trains both fully-connected and convolutional networks out-of-the-box and at ImageNet scale. A PyTorch implementation is available at https://github.com/jxbz/agd and also in Appendix B. Overall, the paper supplies a rigorous theoretical foundation for a next-generation of architecture-dependent optimisers that work automatically and without hyperparameters.

Viaarxiv icon

Conformal Generative Modeling on Triangulated Surfaces

Mar 17, 2023
Victor Dorobantu, Charlotte Borcherds, Yisong Yue

Figure 1 for Conformal Generative Modeling on Triangulated Surfaces
Figure 2 for Conformal Generative Modeling on Triangulated Surfaces
Figure 3 for Conformal Generative Modeling on Triangulated Surfaces
Figure 4 for Conformal Generative Modeling on Triangulated Surfaces

We propose conformal generative modeling, a framework for generative modeling on 2D surfaces approximated by discrete triangle meshes. Our approach leverages advances in discrete conformal geometry to develop a map from a source triangle mesh to a target triangle mesh of a simple manifold such as a sphere. After accounting for errors due to the mesh discretization, we can use any generative modeling approach developed for simple manifolds as a plug-and-play subroutine. We demonstrate our framework on multiple complicated manifolds and multiple generative modeling subroutines, where we show that our approach can learn good estimates of distributions on meshes from samples, and can also learn simultaneously from multiple distinct meshes of the same underlying manifold.

* 13 pages, 6 figures 
Viaarxiv icon

Eventual Discounting Temporal Logic Counterfactual Experience Replay

Mar 03, 2023
Cameron Voloshin, Abhinav Verma, Yisong Yue

Figure 1 for Eventual Discounting Temporal Logic Counterfactual Experience Replay
Figure 2 for Eventual Discounting Temporal Logic Counterfactual Experience Replay
Figure 3 for Eventual Discounting Temporal Logic Counterfactual Experience Replay
Figure 4 for Eventual Discounting Temporal Logic Counterfactual Experience Replay

Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions. However, the standard RL framework can be too myopic to find maximally LTL satisfying policies. This paper makes two contributions. First, we develop a new value-function based proxy, using a technique we call eventual discounting, under which one can find policies that satisfy the LTL specification with highest achievable probability. Second, we develop a new experience replay method for generating off-policy data from on-policy rollouts via counterfactual reasoning on different ways of satisfying the LTL specification. Our experiments, conducted in both discrete and continuous state-action spaces, confirm the effectiveness of our counterfactual experience replay approach.

Viaarxiv icon

BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos

Dec 14, 2022
Jennifer J. Sun, Pierre Karashchuk, Amil Dravid, Serim Ryou, Sonia Fereidooni, John Tuthill, Aggelos Katsaggelos, Bingni W. Brunton, Georgia Gkioxari, Ann Kennedy, Yisong Yue, Pietro Perona

Figure 1 for BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos
Figure 2 for BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos
Figure 3 for BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos
Figure 4 for BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos

Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint discovery approaches commonly process single 2D views and do not operate in the 3D space. We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents, without any keypoint or bounding box supervision in 2D or 3D. Our method uses an encoder-decoder architecture with a 3D volumetric heatmap, trained to reconstruct spatiotemporal differences across multiple views, in addition to joint length constraints on a learned 3D skeleton of the subject. In this way, we discover keypoints without requiring manual supervision in videos of humans and rats, demonstrating the potential of 3D keypoint discovery for studying behavior.

Viaarxiv icon

FI-ODE: Certified and Robust Forward Invariance in Neural ODEs

Oct 30, 2022
Yujia Huang, Ivan Dario Jimenez Rodriguez, Huan Zhang, Yuanyuan Shi, Yisong Yue

Figure 1 for FI-ODE: Certified and Robust Forward Invariance in Neural ODEs
Figure 2 for FI-ODE: Certified and Robust Forward Invariance in Neural ODEs
Figure 3 for FI-ODE: Certified and Robust Forward Invariance in Neural ODEs
Figure 4 for FI-ODE: Certified and Robust Forward Invariance in Neural ODEs

We study how to certifiably enforce forward invariance properties in neural ODEs. Forward invariance implies that the hidden states of the ODE will stay in a ``good'' region, and a robust version would hold even under adversarial perturbations to the input. Such properties can be used to certify desirable behaviors such as adversarial robustness (the hidden states stay in the region that generates accurate classification even under input perturbations) and safety in continuous control (the system never leaves some safe set). We develop a general approach using tools from non-linear control theory and sampling-based verification. Our approach empirically produces the strongest adversarial robustness guarantees compared to prior work on certifiably robust ODE-based models (including implicit-depth models).

Viaarxiv icon

Neurosymbolic Programming for Science

Oct 10, 2022
Jennifer J. Sun, Megan Tjandrasuwita, Atharva Sehgal, Armando Solar-Lezama, Swarat Chaudhuri, Yisong Yue, Omar Costilla-Reyes

Figure 1 for Neurosymbolic Programming for Science
Figure 2 for Neurosymbolic Programming for Science
Figure 3 for Neurosymbolic Programming for Science

Neurosymbolic Programming (NP) techniques have the potential to accelerate scientific discovery across fields. These models combine neural and symbolic components to learn complex patterns and representations from data, using high-level concepts or known constraints. As a result, NP techniques can interface with symbolic domain knowledge from scientists, such as prior knowledge and experimental context, to produce interpretable outputs. Here, we identify opportunities and challenges between current NP models and scientific workflows, with real-world examples from behavior analysis in science. We define concrete next steps to move the NP for science field forward, to enable its use broadly for workflows across the natural and social sciences.

* Neural Information Processing Systems 2022 
Viaarxiv icon

POLAR: Preference Optimization and Learning Algorithms for Robotics

Aug 08, 2022
Maegan Tucker, Kejun Li, Yisong Yue, Aaron D. Ames

Figure 1 for POLAR: Preference Optimization and Learning Algorithms for Robotics
Figure 2 for POLAR: Preference Optimization and Learning Algorithms for Robotics
Figure 3 for POLAR: Preference Optimization and Learning Algorithms for Robotics
Figure 4 for POLAR: Preference Optimization and Learning Algorithms for Robotics

Parameter tuning for robotic systems is a time-consuming and challenging task that often relies on domain expertise of the human operator. Moreover, existing learning methods are not well suited for parameter tuning for many reasons including: the absence of a clear numerical metric for `good robotic behavior'; limited data due to the reliance on real-world experimental data; and the large search space of parameter combinations. In this work, we present an open-source MATLAB Preference Optimization and Learning Algorithms for Robotics toolbox (POLAR) for systematically exploring high-dimensional parameter spaces using human-in-the-loop preference-based learning. This aim of this toolbox is to systematically and efficiently accomplish one of two objectives: 1) to optimize robotic behaviors for human operator preference; 2) to learn the operator's underlying preference landscape to better understand the relationship between adjustable parameters and operator preference. The POLAR toolbox achieves these objectives using only subjective feedback mechanisms (pairwise preferences, coactive feedback, and ordinal labels) to infer a Bayesian posterior over the underlying reward function dictating the user's preferences. We demonstrate the performance of the toolbox in simulation and present various applications of human-in-the-loop preference-based learning.

* 8 page main text, 2 page appendix, 5 figures 
Viaarxiv icon