Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peter Stone

UT Austin, Sony AI

PACER: Preference-conditioned All-terrain Costmap Generation

Oct 30, 2024

Luisa Mao, Garrett Warnell, Peter Stone, Joydeep Biswas

Figure 1 for PACER: Preference-conditioned All-terrain Costmap Generation

Figure 2 for PACER: Preference-conditioned All-terrain Costmap Generation

Figure 3 for PACER: Preference-conditioned All-terrain Costmap Generation

Figure 4 for PACER: Preference-conditioned All-terrain Costmap Generation

Abstract:In autonomous robot navigation, terrain cost assignment is typically performed using a semantics-based paradigm in which terrain is first labeled using a pre-trained semantic classifier and costs are then assigned according to a user-defined mapping between label and cost. While this approach is rapidly adaptable to changing user preferences, only preferences over the types of terrain that are already known by the semantic classifier can be expressed. In this paper, we hypothesize that a machine-learning-based alternative to the semantics-based paradigm above will allow for rapid cost assignment adaptation to preferences expressed over new terrains at deployment time without the need for additional training. To investigate this hypothesis, we introduce and study PACER, a novel approach to costmap generation that accepts as input a single birds-eye view (BEV) image of the surrounding area along with a user-specified preference context and generates a corresponding BEV costmap that aligns with the preference context. Using both real and synthetic data along with a combination of proposed training tasks, we find that PACER is able to adapt quickly to new user preferences while also exhibiting better generalization to novel terrains compared to both semantics-based and representation-learning approaches.

Via

Access Paper or Ask Questions

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

Oct 24, 2024

Zizhao Wang, Jiaheng Hu, Caleb Chuck, Stephen Chen, Roberto Martín-Martín, Amy Zhang, Scott Niekum, Peter Stone

Figure 1 for SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

Figure 2 for SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

Figure 3 for SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

Figure 4 for SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

Abstract:Unsupervised skill discovery carries the promise that an intelligent agent can learn reusable skills through autonomous, reward-free environment interaction. Existing unsupervised skill discovery methods learn skills by encouraging distinguishable behaviors that cover diverse states. However, in complex environments with many state factors (e.g., household environments with many objects), learning skills that cover all possible states is impossible, and naively encouraging state diversity often leads to simple skills that are not ideal for solving downstream tasks. This work introduces Skill Discovery from Local Dependencies (Skild), which leverages state factorization as a natural inductive bias to guide the skill learning process. The key intuition guiding Skild is that skills that induce <b>diverse interactions</b> between state factors are often more valuable for solving downstream tasks. To this end, Skild develops a novel skill learning objective that explicitly encourages the mastering of skills that effectively induce different interactions within an environment. We evaluate Skild in several domains with challenging, long-horizon sparse reward tasks including a realistic simulated household robot domain, where Skild successfully learns skills with clear semantic meaning and shows superior performance compared to existing unsupervised reinforcement learning methods that only maximize state coverage.

Via

Access Paper or Ask Questions

Learning to Look: Seeking Information for Decision Making via Policy Factorization

Oct 24, 2024

Shivin Dass, Jiaheng Hu, Ben Abbatematteo, Peter Stone, Roberto Martín-Martín

Figure 1 for Learning to Look: Seeking Information for Decision Making via Policy Factorization

Figure 2 for Learning to Look: Seeking Information for Decision Making via Policy Factorization

Figure 3 for Learning to Look: Seeking Information for Decision Making via Policy Factorization

Figure 4 for Learning to Look: Seeking Information for Decision Making via Policy Factorization

Abstract:Many robot manipulation tasks require active or interactive exploration behavior in order to be performed successfully. Such tasks are ubiquitous in embodied domains, where agents must actively search for the information necessary for each stage of a task, e.g., moving the head of the robot to find information relevant to manipulation, or in multi-robot domains, where one scout robot may search for the information that another robot needs to make informed decisions. We identify these tasks with a new type of problem, factorized Contextual Markov Decision Processes, and propose DISaM, a dual-policy solution composed of an information-seeking policy that explores the environment to find the relevant contextual information and an information-receiving policy that exploits the context to achieve the manipulation goal. This factorization allows us to train both policies separately, using the information-receiving one to provide reward to train the information-seeking policy. At test time, the dual agent balances exploration and exploitation based on the uncertainty the manipulation policy has on what the next best action is. We demonstrate the capabilities of our dual policy solution in five manipulation tasks that require information-seeking behaviors, both in simulation and in the real-world, where DISaM significantly outperforms existing methods. More information at https://robin-lab.cs.utexas.edu/learning2look/.

* Project Website: https://robin-lab.cs.utexas.edu/learning2look/

Via

Access Paper or Ask Questions

Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

Oct 15, 2024

Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martín-Martín

Figure 1 for Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

Figure 2 for Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

Figure 3 for Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

Figure 4 for Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

Abstract:A hallmark of intelligent agents is the ability to learn reusable skills purely from unsupervised interaction with the environment. However, existing unsupervised skill discovery methods often learn entangled skills where one skill variable simultaneously influences many entities in the environment, making downstream skill chaining extremely challenging. We propose Disentangled Unsupervised Skill Discovery (DUSDi), a method for learning disentangled skills that can be efficiently reused to solve downstream tasks. DUSDi decomposes skills into disentangled components, where each skill component only affects one factor of the state space. Importantly, these skill components can be concurrently composed to generate low-level actions, and efficiently chained to tackle downstream tasks through hierarchical Reinforcement Learning. DUSDi defines a novel mutual-information-based objective to enforce disentanglement between the influences of different skill components, and utilizes value factorization to optimize this objective efficiently. Evaluated in a set of challenging environments, DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks. Code and skills visualization at jiahenghu.github.io/DUSDi-site/.

* NeurIPS2024

Via

Access Paper or Ask Questions

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Oct 13, 2024

Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno

Figure 1 for SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Figure 2 for SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Figure 3 for SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Figure 4 for SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Abstract:Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simplicity bias, guiding models toward simple and generalizable solutions. However, in deep RL, designing and scaling up networks have been less explored. Motivated by this opportunity, we present SimBa, an architecture designed to scale up parameters in deep RL by injecting a simplicity bias. SimBa consists of three components: (i) an observation normalization layer that standardizes inputs with running statistics, (ii) a residual feedforward block to provide a linear pathway from the input to output, and (iii) a layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms-including off-policy, on-policy, and unsupervised methods-is consistently improved. Moreover, solely by integrating SimBa architecture into SAC, it matches or surpasses state-of-the-art deep RL methods with high computational efficiency across DMC, MyoSuite, and HumanoidBench. These results demonstrate SimBa's broad applicability and effectiveness across diverse RL algorithms and environments.

* preprint

Via

Access Paper or Ask Questions

Effort Allocation for Deadline-Aware Task and Motion Planning: A Metareasoning Approach

Oct 08, 2024

Yoonchang Sung, Shahaf S. Shperberg, Qi Wang, Peter Stone

Figure 1 for Effort Allocation for Deadline-Aware Task and Motion Planning: A Metareasoning Approach

Figure 2 for Effort Allocation for Deadline-Aware Task and Motion Planning: A Metareasoning Approach

Figure 3 for Effort Allocation for Deadline-Aware Task and Motion Planning: A Metareasoning Approach

Figure 4 for Effort Allocation for Deadline-Aware Task and Motion Planning: A Metareasoning Approach

Abstract:In robot planning, tasks can often be achieved through multiple options, each consisting of several actions. This work specifically addresses deadline constraints in task and motion planning, aiming to find a plan that can be executed within the deadline despite uncertain planning and execution times. We propose an effort allocation problem, formulated as a Markov decision process (MDP), to find such a plan by leveraging metareasoning perspectives to allocate computational resources among the given options. We formally prove the NP-hardness of the problem by reducing it from the knapsack problem. Both a model-based approach, where transition models are learned from past experience, and a model-free approach, which overcomes the unavailability of prior data acquisition through reinforcement learning, are explored. For the model-based approach, we investigate Monte Carlo tree search (MCTS) to approximately solve the proposed MDP and further design heuristic schemes to tackle NP-hardness, leading to the approximate yet efficient algorithm called DP_Rerun. In experiments, DP_Rerun demonstrates promising performance comparable to MCTS while requiring negligible computation time.

* 48 pages, 6 figures

Via

Access Paper or Ask Questions

Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory

Oct 03, 2024

Alexander Levine, Peter Stone, Amy Zhang

Figure 1 for Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory

Figure 2 for Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory

Figure 3 for Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory

Figure 4 for Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory

Abstract:In order to train agents that can quickly adapt to new objectives or reward functions, efficient unsupervised representation learning in sequential decision-making environments can be important. Frameworks such as the Exogenous Block Markov Decision Process (Ex-BMDP) have been proposed to formalize this representation-learning problem (Efroni et al., 2022b). In the Ex-BMDP framework, the agent's high-dimensional observations of the environment have two latent factors: a controllable factor, which evolves deterministically within a small state space according to the agent's actions, and an exogenous factor, which represents time-correlated noise, and can be highly complex. The goal of the representation learning problem is to learn an encoder that maps from observations into the controllable latent space, as well as the dynamics of this space. Efroni et al. (2022b) has shown that this is possible with a sample complexity that depends only on the size of the controllable latent space, and not on the size of the noise factor. However, this prior work has focused on the episodic setting, where the controllable latent state resets to a specific start state after a finite horizon. By contrast, if the agent can only interact with the environment in a single continuous trajectory, prior works have not established sample-complexity bounds. We propose STEEL, the first provably sample-efficient algorithm for learning the controllable dynamics of an Ex-BMDP from a single trajectory, in the function approximation setting. STEEL has a sample complexity that depends only on the sizes of the controllable latent space and the encoder function class, and (at worst linearly) on the mixing time of the exogenous noise factor. We prove that STEEL is correct and sample-efficient, and demonstrate STEEL on two toy problems. Code is available at: https://github.com/midi-lab/steel.

Via

Access Paper or Ask Questions

Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting

Oct 01, 2024

Bo Liu, Mao Ye, Peter Stone, Qiang Liu

Figure 1 for Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting

Figure 2 for Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting

Figure 3 for Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting

Figure 4 for Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting

Abstract:A fundamental challenge in continual learning is to balance the trade-off between learning new tasks and remembering the previously acquired knowledge. Gradient Episodic Memory (GEM) achieves this balance by utilizing a subset of past training samples to restrict the update direction of the model parameters. In this work, we start by analyzing an often overlooked hyper-parameter in GEM, the memory strength, which boosts the empirical performance by further constraining the update direction. We show that memory strength is effective mainly because it improves GEM's generalization ability and therefore leads to a more favorable trade-off. By this finding, we propose two approaches that more flexibly constrain the update direction. Our methods are able to achieve uniformly better Pareto Frontiers of remembering old and learning new knowledge than using memory strength. We further propose a computationally efficient method to approximately solve the optimization problem with more constraints.

Via

Access Paper or Ask Questions

Grounded Curriculum Learning

Sep 29, 2024

Linji Wang, Zifan Xu, Peter Stone, Xuesu Xiao

Figure 1 for Grounded Curriculum Learning

Figure 2 for Grounded Curriculum Learning

Figure 3 for Grounded Curriculum Learning

Figure 4 for Grounded Curriculum Learning

Abstract:The high cost of real-world data for robotics Reinforcement Learning (RL) leads to the wide usage of simulators. Despite extensive work on building better dynamics models for simulators to match with the real world, there is another, often-overlooked mismatch between simulations and the real world, namely the distribution of available training tasks. Such a mismatch is further exacerbated by existing curriculum learning techniques, which automatically vary the simulation task distribution without considering its relevance to the real world. Considering these challenges, we posit that curriculum learning for robotics RL needs to be grounded in real-world task distributions. To this end, we propose Grounded Curriculum Learning (GCL), which aligns the simulated task distribution in the curriculum with the real world, as well as explicitly considers what tasks have been given to the robot and how the robot has performed in the past. We validate GCL using the BARN dataset on complex navigation tasks, achieving a 6.8% and 6.5% higher success rate compared to a state-of-the-art CL method and a curriculum designed by human experts, respectively. These results show that GCL can enhance learning efficiency and navigation performance by grounding the simulation task distribution in the real world within an adaptive curriculum.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions

FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning

Sep 25, 2024

Jiaheng Hu, Rose Hendrix, Ali Farhadi, Aniruddha Kembhavi, Roberto Martin-Martin, Peter Stone, Kuo-Hao Zeng, Kiana Ehsan

Figure 1 for FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning

Figure 2 for FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning

Figure 3 for FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning

Figure 4 for FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning

Abstract:In recent years, the Robotics field has initiated several efforts toward building generalist robot policies through large-scale multi-task Behavior Cloning. However, direct deployments of these policies have led to unsatisfactory performance, where the policy struggles with unseen states and tasks. How can we break through the performance plateau of these models and elevate their capabilities to new heights? In this paper, we propose FLaRe, a large-scale Reinforcement Learning fine-tuning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques. Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance both on previously demonstrated and on entirely novel tasks and embodiments. Specifically, on a set of long-horizon mobile manipulation tasks, FLaRe achieves an average success rate of 79.5% in unseen environments, with absolute improvements of +23.6% in simulation and +30.7% on real robots over prior SoTA methods. By utilizing only sparse rewards, our approach can enable generalizing to new capabilities beyond the pretraining data with minimal human effort. Moreover, we demonstrate rapid adaptation to new embodiments and behaviors with less than a day of fine-tuning. Videos can be found on the project website at https://robot-flare.github.io/

Via

Access Paper or Ask Questions