Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhishek Gupta

BCG Henderson Institute, Montreal AI Ethics Institute, Boston Consulting Group

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

Feb 01, 2024

Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine

Abstract:In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementation details of these algorithms are often just as important (if not more so) for performance as the choice of algorithm. We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods. To address this challenge, we developed a carefully implemented library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment, a high-quality controller for a widely-adopted robot, and a number of challenging example tasks. We provide this library as a resource for the community, describe its design choices, and present experimental results. Perhaps surprisingly, we find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation between 25 to 50 minutes of training per policy on average, improving over state-of-the-art results reported for similar tasks in the literature. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent recovery and correction behaviors. We hope that these promising results and our high-quality open-source implementation will provide a tool for the robotics community to facilitate further developments in robotic RL. Our code, documentation, and videos can be found at https://serl-robot.github.io/

* ICRA 2024

Via

Access Paper or Ask Questions

Multiform Evolution for High-Dimensional Problems with Low Effective Dimensionality

Dec 30, 2023

Yaqing Hou, Mingyang Sun, Abhishek Gupta, Yaochu Jin, Haiyin Piao, Hongwei Ge, Qiang Zhang

Abstract:In this paper, we scale evolutionary algorithms to high-dimensional optimization problems that deceptively possess a low effective dimensionality (certain dimensions do not significantly affect the objective function). To this end, an instantiation of the multiform optimization paradigm is presented, where multiple low-dimensional counterparts of a target high-dimensional task are generated via random embeddings. Since the exact relationship between the auxiliary (low-dimensional) tasks and the target is a priori unknown, a multiform evolutionary algorithm is developed for unifying all formulations into a single multi-task setting. The resultant joint optimization enables the target task to efficiently reuse solutions evolved across various low-dimensional searches via cross-form genetic transfers, hence speeding up overall convergence characteristics. To validate the overall efficacy of our proposed algorithmic framework, comprehensive experimental studies are carried out on well-known continuous benchmark functions as well as a set of practical problems in the hyper-parameter tuning of machine learning models and deep learning models in classification tasks and Predator-Prey games, respectively.

* 12 pages,6 figures

Via

Access Paper or Ask Questions

Inverse Transfer Multiobjective Optimization

Dec 26, 2023

Jiao Liu, Abhishek Gupta, Yew-Soon Ong

Figure 1 for Inverse Transfer Multiobjective Optimization

Figure 2 for Inverse Transfer Multiobjective Optimization

Figure 3 for Inverse Transfer Multiobjective Optimization

Figure 4 for Inverse Transfer Multiobjective Optimization

Abstract:Transfer optimization enables data-efficient optimization of a target task by leveraging experiential priors from related source tasks. This is especially useful in multiobjective optimization settings where a set of trade-off solutions is sought under tight evaluation budgets. In this paper, we introduce a novel concept of inverse transfer in multiobjective optimization. Inverse transfer stands out by employing probabilistic inverse models to map performance vectors in the objective space to population search distributions in task-specific decision space, facilitating knowledge transfer through objective space unification. Building upon this idea, we introduce the first Inverse Transfer Multiobjective Evolutionary Optimizer (invTrEMO). A key highlight of invTrEMO is its ability to harness the common objective functions prevalent in many application areas, even when decision spaces do not precisely align between tasks. This allows invTrEMO to uniquely and effectively utilize information from heterogeneous source tasks as well. Furthermore, invTrEMO yields high-precision inverse models as a significant byproduct, enabling the generation of tailored solutions on-demand based on user preferences. Empirical studies on multi- and many-objective benchmark problems, as well as a practical case study, showcase the faster convergence rate and modelling accuracy of the invTrEMO relative to state-of-the-art evolutionary and Bayesian optimization algorithms. The source code of the invTrEMO is made available at https://github.com/LiuJ-2023/invTrEMO.

Via

Access Paper or Ask Questions

Modeling Boundedly Rational Agents with Latent Inference Budgets

Dec 07, 2023

Athul Paul Jacob, Abhishek Gupta, Jacob Andreas

Abstract:We study the problem of modeling a population of agents pursuing unknown goals subject to unknown computational constraints. In standard models of bounded rationality, sub-optimal decision-making is simulated by adding homoscedastic noise to optimal decisions rather than explicitly simulating constrained inference. In this work, we introduce a latent inference budget model (L-IBM) that models agents' computational constraints explicitly, via a latent variable (inferred jointly with a model of agents' goals) that controls the runtime of an iterative inference algorithm. L-IBMs make it possible to learn agent models using data from diverse populations of suboptimal actors. In three modeling tasks -- inferring navigation goals from routes, inferring communicative intents from human utterances, and predicting next moves in human chess games -- we show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty. Inferred inference budgets are themselves meaningful, efficient to compute, and correlated with measures of player skill, partner skill and task difficulty.

Via

Access Paper or Ask Questions

Generalizable Neural Physics Solvers by Baldwinian Evolution

Dec 06, 2023

Jian Cheng Wong, Chin Chun Ooi, Abhishek Gupta, Pao-Hsiung Chiu, Joshua Shao Zheng Low, My Ha Dao, Yew-Soon Ong

Figure 1 for Generalizable Neural Physics Solvers by Baldwinian Evolution

Figure 2 for Generalizable Neural Physics Solvers by Baldwinian Evolution

Figure 3 for Generalizable Neural Physics Solvers by Baldwinian Evolution

Figure 4 for Generalizable Neural Physics Solvers by Baldwinian Evolution

Abstract:Physics-informed neural networks (PINNs) are at the forefront of scientific machine learning, making possible the creation of machine intelligence that is cognizant of physical laws and able to accurately simulate them. In this paper, the potential of discovering PINNs that generalize over an entire family of physics tasks is studied, for the first time, through a biological lens of the Baldwin effect. Drawing inspiration from the neurodevelopment of precocial species that have evolved to learn, predict and react quickly to their environment, we envision PINNs that are pre-wired with connection strengths inducing strong biases towards efficient learning of physics. To this end, evolutionary selection pressure (guided by proficiency over a family of tasks) is coupled with lifetime learning (to specialize on a smaller subset of those tasks) to produce PINNs that demonstrate fast and physics-compliant prediction capabilities across a range of empirically challenging problem instances. The Baldwinian approach achieves an order of magnitude improvement in prediction accuracy at a fraction of the computation cost compared to state-of-the-art results with PINNs meta-learned by gradient descent. This paper marks a leap forward in the meta-learning of PINNs as generalizable physics solvers.

Via

Access Paper or Ask Questions

New Epochs in AI Supervision: Design and Implementation of an Autonomous Radiology AI Monitoring System

Nov 24, 2023

Vasantha Kumar Venugopal, Abhishek Gupta, Rohit Takhar, Vidur Mahajan

Figure 1 for New Epochs in AI Supervision: Design and Implementation of an Autonomous Radiology AI Monitoring System

Figure 2 for New Epochs in AI Supervision: Design and Implementation of an Autonomous Radiology AI Monitoring System

Figure 3 for New Epochs in AI Supervision: Design and Implementation of an Autonomous Radiology AI Monitoring System

Figure 4 for New Epochs in AI Supervision: Design and Implementation of an Autonomous Radiology AI Monitoring System

Abstract:With the increasingly widespread adoption of AI in healthcare, maintaining the accuracy and reliability of AI models in clinical practice has become crucial. In this context, we introduce novel methods for monitoring the performance of radiology AI classification models in practice, addressing the challenges of obtaining real-time ground truth for performance monitoring. We propose two metrics - predictive divergence and temporal stability - to be used for preemptive alerts of AI performance changes. Predictive divergence, measured using Kullback-Leibler and Jensen-Shannon divergences, evaluates model accuracy by comparing predictions with those of two supplementary models. Temporal stability is assessed through a comparison of current predictions against historical moving averages, identifying potential model decay or data drift. This approach was retrospectively validated using chest X-ray data from a single-center imaging clinic, demonstrating its effectiveness in maintaining AI model reliability. By providing continuous, real-time insights into model performance, our system ensures the safe and effective use of AI in clinical decision-making, paving the way for more robust AI integration in healthcare

* 10 pages, 4 figures, 2 tables

Via

Access Paper or Ask Questions

Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback

Oct 31, 2023

Max Balsells, Marcel Torne, Zihan Wang, Samedh Desai, Pulkit Agrawal, Abhishek Gupta

Abstract:Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the difficulty of providing well "shaped" rewards, and the difficulty of continual reset-free training. In this work, we describe a system for real-world reinforcement learning that enables agents to show continual improvement by training directly in the real world without requiring painstaking effort to hand-design reward functions or reset mechanisms. Our system leverages occasional non-expert human-in-the-loop feedback from remote users to learn informative distance functions to guide exploration while leveraging a simple self-supervised learning algorithm for goal-directed policy learning. We show that in the absence of resets, it is particularly important to account for the current "reachability" of the exploration policy when deciding which regions of the space to explore. Based on this insight, we instantiate a practical learning system - GEAR, which enables robots to simply be placed in real-world environments and left to train autonomously without interruption. The system streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans in the form of binary comparative feedback. We evaluate this system on a suite of robotic tasks in simulation and demonstrate its effectiveness at learning behaviors both in simulation and the real world. Project website https://guided-exploration-autonomous-rl.github.io/GEAR/.

* Project website https://guided-exploration-autonomous-rl.github.io/GEAR/

Via

Access Paper or Ask Questions

Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Oct 30, 2023

Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du

Figure 1 for Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Figure 2 for Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Figure 3 for Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Figure 4 for Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Abstract:Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be an important technique for solving sequential decision-making problems. However, in the presence of function approximation such algorithms are not guaranteed to converge, often diverging due to the absence of Bellman-completeness in the function classes considered, a crucial condition for the success of DP-based methods. In this paper, we show how off-policy learning techniques based on return-conditioned supervised learning (RCSL) are able to circumvent these challenges of Bellman completeness, converging under significantly more relaxed assumptions inherited from supervised learning. We prove there exists a natural environment in which if one uses two-layer multilayer perceptron as the function approximator, the layer width needs to grow linearly with the state space size to satisfy Bellman-completeness while a constant layer width is enough for RCSL. These findings take a step towards explaining the superior empirical performance of RCSL methods compared to DP-based methods in environments with near-optimal datasets. Furthermore, in order to learn from sub-optimal datasets, we propose a simple framework called MBRCSL, granting RCSL methods the ability of dynamic programming to stitch together segments from distinct trajectories. MBRCSL leverages learned dynamics models and forward sampling to accomplish trajectory stitching while avoiding the need for Bellman completeness that plagues all dynamic programming algorithms. We propose both theoretical analysis and experimental evaluation to back these claims, outperforming state-of-the-art model-free and model-based offline RL algorithms across several simulated robotics problems.

Via

Access Paper or Ask Questions

CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

Oct 19, 2023

Liyiming Ke, Yunchu Zhang, Abhay Deshpande, Siddhartha Srinivasa, Abhishek Gupta

Figure 1 for CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

Figure 2 for CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

Figure 3 for CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

Figure 4 for CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

Abstract:We present a new technique to enhance the robustness of imitation learning methods by generating corrective data to account for compounding errors and disturbances. While existing methods rely on interactive expert labeling, additional offline datasets, or domain-specific invariances, our approach requires minimal additional assumptions beyond access to expert data. The key insight is to leverage local continuity in the environment dynamics to generate corrective labels. Our method first constructs a dynamics model from the expert demonstration, encouraging local Lipschitz continuity in the learned model. In locally continuous regions, this model allows us to generate corrective labels within the neighborhood of the demonstrations but beyond the actual set of states and actions in the dataset. Training on this augmented data enhances the agent's ability to recover from perturbations and deal with compounding errors. We demonstrate the effectiveness of our generated labels through experiments in a variety of robotics domains in simulation that have distinct forms of continuity and discontinuity, including classic control problems, drone flying, navigation with high-dimensional sensor observations, legged locomotion, and tabletop manipulation.

Via

Access Paper or Ask Questions

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Oct 12, 2023

Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal

Figure 1 for Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Figure 2 for Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Figure 3 for Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Figure 4 for Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Abstract:Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms. Code is available at https://github.com/Improbable-AI/dw-offline-rl.

* NeurIPS 2023
* Accepted NeurIPS 2023

Via

Access Paper or Ask Questions