Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peter Stone

UT Austin, Sony AI

Multistep Inverse Is Not All You Need

Mar 18, 2024

Alexander Levine, Peter Stone, Amy Zhang

Abstract:In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which formalizes control problems where observations can be factorized into an action-dependent latent state which evolves deterministically, and action-independent time-correlated noise. Lamb et al. (2022) proposes the "AC-State" method for learning an encoder to extract a complete action-dependent latent state representation from the observations in such problems. AC-State is a multistep-inverse method, in that it uses the encoding of the the first and last state in a path to predict the first action in the path. However, we identify cases where AC-State will fail to learn a correct latent representation of the agent-controllable factor of the state. We therefore propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model. ACDF is guaranteed to correctly infer an action-dependent latent state encoder for a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF on tabular Ex-BMDPs through numerical simulations; as well as high-dimensional environments using neural-network-based encoders. Code is available at https://github.com/midi-lab/acdf.

Via

Access Paper or Ask Questions

TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

Mar 12, 2024

Shivin Dass, Wensi Ai, Yuqian Jiang, Samik Singh, Jiaheng Hu, Ruohan Zhang, Peter Stone, Ben Abbatematteo, Roberto Martin-Martin

Figure 1 for TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

Figure 2 for TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

Figure 3 for TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

Figure 4 for TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

Abstract:A critical bottleneck limiting imitation learning in robotics is the lack of data. This problem is more severe in mobile manipulation, where collecting demonstrations is harder than in stationary manipulation due to the lack of available and easy-to-use teleoperation interfaces. In this work, we demonstrate TeleMoMa, a general and modular interface for whole-body teleoperation of mobile manipulators. TeleMoMa unifies multiple human interfaces including RGB and depth cameras, virtual reality controllers, keyboard, joysticks, etc., and any combination thereof. In its more accessible version, TeleMoMa works using simply vision (e.g., an RGB-D camera), lowering the entry bar for humans to provide mobile manipulation demonstrations. We demonstrate the versatility of TeleMoMa by teleoperating several existing mobile manipulators - PAL Tiago++, Toyota HSR, and Fetch - in simulation and the real world. We demonstrate the quality of the demonstrations collected with TeleMoMa by training imitation learning policies for mobile manipulation tasks involving synchronized whole-body motion. Finally, we also show that TeleMoMa's teleoperation channel enables teleoperation on site, looking at the robot, or remote, sending commands and observations through a computer network, and perform user studies to evaluate how easy it is for novice users to learn to collect demonstrations with different combinations of human interfaces enabled by our system. We hope TeleMoMa becomes a helpful tool for the community enabling researchers to collect whole-body mobile manipulation demonstrations. For more information and video results, https://robin-lab.cs.utexas.edu/telemoma-web.

Via

Access Paper or Ask Questions

Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning

Mar 06, 2024

Zifan Xu, Amir Hossain Raj, Xuesu Xiao, Peter Stone

Figure 1 for Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning

Figure 2 for Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning

Figure 3 for Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning

Figure 4 for Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning

Abstract:Recent advances of locomotion controllers utilizing deep reinforcement learning (RL) have yielded impressive results in terms of achieving rapid and robust locomotion across challenging terrain, such as rugged rocks, non-rigid ground, and slippery surfaces. However, while these controllers primarily address challenges underneath the robot, relatively little research has investigated legged mobility through confined 3D spaces, such as narrow tunnels or irregular voids, which impose all-around constraints. The cyclic gait patterns resulted from existing RL-based methods to learn parameterized locomotion skills characterized by motion parameters, such as velocity and body height, may not be adequate to navigate robots through challenging confined 3D spaces, requiring both agile 3D obstacle avoidance and robust legged locomotion. Instead, we propose to learn locomotion skills end-to-end from goal-oriented navigation in confined 3D spaces. To address the inefficiency of tracking distant navigation goals, we introduce a hierarchical locomotion controller that combines a classical planner tasked with planning waypoints to reach a faraway global goal location, and an RL-based policy trained to follow these waypoints by generating low-level motion commands. This approach allows the policy to explore its own locomotion skills within the entire solution space and facilitates smooth transitions between local goals, enabling long-term navigation towards distant goals. In simulation, our hierarchical approach succeeds at navigating through demanding confined 3D environments, outperforming both pure end-to-end learning approaches and parameterized locomotion skills. We further demonstrate the successful real-world deployment of our simulation-trained controller on a real robot.

Via

Access Paper or Ask Questions

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Mar 06, 2024

Ziping Xu, Zifan Xu, Runxuan Jiang, Peter Stone, Ambuj Tewari

Figure 1 for Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Figure 2 for Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Figure 3 for Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Figure 4 for Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Abstract:Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks. However, while recent advancements in MTRL theory have focused on the improved statistical efficiency by assuming a shared structure across tasks, exploration--a crucial aspect of RL--has been largely overlooked. This paper addresses this gap by showing that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design like $\epsilon$-greedy that are inefficient in general can be sample-efficient for MTRL. To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL. It may also shed light on the enigmatic success of the wide applications of myopic exploration in practice. To validate the role of diversity, we conduct experiments on synthetic robotic control environments, where the diverse task set aligns with the task selection by automatic curriculum learning, which is empirically shown to improve sample-efficiency.

Via

Access Paper or Ask Questions

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Jan 23, 2024

Zizhao Wang, Caroline Wang, Xuesu Xiao, Yuke Zhu, Peter Stone

Figure 1 for Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Figure 2 for Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Figure 3 for Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Figure 4 for Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Abstract:Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is to learn state abstractions, which only keep the necessary variables for learning the tasks at hand. This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM leverages and improves implicit modeling to train a high-fidelity causal dynamics model that can be reused for all tasks in the same environment. Empirical validation on manipulation environments and Deepmind Control Suite reveals that CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones. Furthermore, the derived state abstractions allow a task learner to achieve near-oracle levels of sample efficiency and outperform baselines on all tasks.

* Accepted at AAAI24

Via

Access Paper or Ask Questions

t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

Jan 04, 2024

William Yue, Bo Liu, Peter Stone

Figure 1 for t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

Figure 2 for t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

Figure 3 for t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

Figure 4 for t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making

Abstract:Deep generative replay has emerged as a promising approach for continual learning in decision-making tasks. This approach addresses the problem of catastrophic forgetting by leveraging the generation of trajectories from previously encountered tasks to augment the current dataset. However, existing deep generative replay methods for continual learning rely on autoregressive models, which suffer from compounding errors in the generated trajectories. In this paper, we propose a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep. We evaluate our method on Continual World benchmarks and find that our approach achieves state-of-the-art performance on the average success rate metric among continual learning methods. Code is available at https://github.com/WilliamYue37/t-DGR .

* 2nd Workshop on Agent Learning in Open-Endedness (ALOE) at NeurIPS 2023

Via

Access Paper or Ask Questions

Latent Skill Discovery for Chain-of-Thought Reasoning

Dec 07, 2023

Zifan Xu, Haozhu Wang, Dmitriy Bespalov, Peter Stone, Yanjun Qi

Figure 1 for Latent Skill Discovery for Chain-of-Thought Reasoning

Figure 2 for Latent Skill Discovery for Chain-of-Thought Reasoning

Figure 3 for Latent Skill Discovery for Chain-of-Thought Reasoning

Figure 4 for Latent Skill Discovery for Chain-of-Thought Reasoning

Abstract:Recent advances in Large Language Models (LLMs) have led to an emergent ability of chain-of-thought (CoT) prompting, a prompt reasoning strategy that adds intermediate rationale steps between questions and answers to construct prompts. Conditioned on these prompts, LLMs can effectively learn in context to generate rationales that lead to more accurate answers than when answering the same question directly. To design LLM prompts, one important setting, called demonstration selection, considers selecting demonstrations from an example bank. Existing methods use various heuristics for this selection, but for CoT prompting, which involves unique rationales, it is essential to base the selection upon the intrinsic skills that CoT rationales need, for instance, the skills of addition or subtraction for math word problems. To address this requirement, we introduce a novel approach named Reasoning Skill Discovery (RSD) that use unsupervised learning to create a latent space representation of rationales, called a reasoning skill. Simultaneously, RSD learns a reasoning policy to determine the required reasoning skill for a given question. This can then guide the selection of examples that demonstrate the required reasoning skills. Our approach offers several desirable properties: it is (1) theoretically grounded, (2) sample-efficient, requiring no LLM inference or manual prompt design, and (3) LLM-agnostic. Empirically, RSD outperforms existing methods by up to 6% in terms of the answer accuracy across multiple reasoning tasks.

Via

Access Paper or Ask Questions

ICRA Roboethics Challenge 2023: Intelligent Disobedience in an Elderly Care Home

Nov 15, 2023

Sveta Paster, Kantwon Rogers, Gordon Briggs, Peter Stone, Reuth Mirsky

Figure 1 for ICRA Roboethics Challenge 2023: Intelligent Disobedience in an Elderly Care Home

Abstract:With the projected surge in the elderly population, service robots offer a promising avenue to enhance their well-being in elderly care homes. Such robots will encounter complex scenarios which will require them to perform decisions with ethical consequences. In this report, we propose to leverage the Intelligent Disobedience framework in order to give the robot the ability to perform a deliberation process over decisions with potential ethical implications. We list the issues that this framework can assist with, define it formally in the context of the specific elderly care home scenario, and delineate the requirements for implementing an intelligently disobeying robot. We conclude this report with some critical analysis and suggestions for future work.

* This report is part of ICRA roboethics competition : https://competition.raiselab.ca/competition-details-2023_1/ethics-challenge/submitted-proposals/submission-1

Via

Access Paper or Ask Questions

Exploring the Cost of Interruptions in Human-Robot Teaming

Nov 01, 2023

Swathi Mannem, William Macke, Peter Stone, Reuth Mirsky

Figure 1 for Exploring the Cost of Interruptions in Human-Robot Teaming

Figure 2 for Exploring the Cost of Interruptions in Human-Robot Teaming

Figure 3 for Exploring the Cost of Interruptions in Human-Robot Teaming

Figure 4 for Exploring the Cost of Interruptions in Human-Robot Teaming

Abstract:Productive and efficient human-robot teaming is a highly desirable ability in service robots, yet there is a fundamental trade-off that a robot needs to consider in such tasks. On the one hand, gaining information from communication with teammates can help individual planning. On the other hand, such communication comes at the cost of distracting teammates from efficiently completing their goals, which can also harm the overall team performance. In this study, we quantify the cost of interruptions in terms of degradation of human task performance, as a robot interrupts its teammate to gain information about their task. Interruptions are varied in timing, content, and proximity. The results show that people find the interrupting robot significantly less helpful. However, the human teammate's performance in a secondary task deteriorates only slightly when interrupted. These results imply that while interruptions can objectively have a low cost, an uninformed implementation can cause these interruptions to be perceived as distracting. These research outcomes can be leveraged in numerous applications where collaborative robots must be aware of the costs and gains of interruptive communication, including logistics and service robots.

* Preprint of a paper accepted for publication in Humanoids 2023 (https://2023.ieee-humanoids.org/)

Via

Access Paper or Ask Questions

Learning Generalizable Manipulation Policies with Object-Centric 3D Representations

Oct 22, 2023

Yifeng Zhu, Zhenyu Jiang, Peter Stone, Yuke Zhu

Figure 1 for Learning Generalizable Manipulation Policies with Object-Centric 3D Representations

Figure 2 for Learning Generalizable Manipulation Policies with Object-Centric 3D Representations

Figure 3 for Learning Generalizable Manipulation Policies with Object-Centric 3D Representations

Figure 4 for Learning Generalizable Manipulation Policies with Object-Centric 3D Representations

Abstract:We introduce GROOT, an imitation learning method for learning robust policies with object-centric and 3D priors. GROOT builds policies that generalize beyond their initial training conditions for vision-based manipulation. It constructs object-centric 3D representations that are robust toward background changes and camera views and reason over these representations using a transformer-based policy. Furthermore, we introduce a segmentation correspondence model that allows policies to generalize to new objects at test time. Through comprehensive experiments, we validate the robustness of GROOT policies against perceptual variations in simulated and real-world environments. GROOT's performance excels in generalization over background changes, camera viewpoint shifts, and the presence of new object instances, whereas both state-of-the-art end-to-end learning methods and object proposal-based approaches fall short. We also extensively evaluate GROOT policies on real robots, where we demonstrate the efficacy under very wild changes in setup. More videos and model details can be found in the appendix and the project website: https://ut-austin-rpl.github.io/GROOT .

* Accepted at the 7th Annual Conference on Robot Learning (CoRL), 2023 in Atlanta, US

Via

Access Paper or Ask Questions