Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tomas Jackson

Relentless Adrenalin

VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

May 25, 2024

Michael Ahn, Montserrat Gonzalez Arenas, Matthew Bennice, Noah Brown, Christine Chan, Byron David, Anthony Francis, Gavin Gonzalez, Rainer Hessmer, Tomas Jackson(+15 more)

Figure 1 for VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Figure 2 for VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Figure 3 for VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Figure 4 for VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Abstract:Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon tasks with the help of humans or other robots. VADER leverages visual question answering (VQA) modules to detect visual affordances and recognize execution errors. It then generates prompts for a language model planner (LMP) which decides when to seek help from another robot or human to recover from errors in long-horizon task execution. We show the effectiveness of VADER with two long-horizon robotic tasks. Our pilot study showed that VADER is capable of performing complex long-horizon tasks by asking for help from another robot to clear a table. Our user study showed that VADER is capable of performing complex long-horizon tasks by asking for help from a human to clear a path. We gathered feedback from people (N=19) about the performance of the VADER performance vs. a robot that did not ask for help. https://google-vader.github.io/

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Sep 18, 2023

Yevgen Chebotar, Quan Vuong, Alex Irpan, Karol Hausman, Fei Xia, Yao Lu, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch(+15 more)

Figure 1 for Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Figure 2 for Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Figure 3 for Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Figure 4 for Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Abstract:In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://q-transformer.github.io

* See website at https://q-transformer.github.io

Via

Access Paper or Ask Questions

RT-1: Robotics Transformer for Real-World Control at Scale

Dec 13, 2022

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu(+41 more)

Figure 1 for RT-1: Robotics Transformer for Real-World Control at Scale

Figure 2 for RT-1: Robotics Transformer for Real-World Control at Scale

Figure 3 for RT-1: Robotics Transformer for Real-World Control at Scale

Figure 4 for RT-1: Robotics Transformer for Real-World Control at Scale

Abstract:By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io

* See website at robotics-transformer.github.io

Via

Access Paper or Ask Questions

Inner Monologue: Embodied Reasoning through Planning with Language Models

Jul 12, 2022

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar(+7 more)

Figure 1 for Inner Monologue: Embodied Reasoning through Planning with Language Models

Figure 2 for Inner Monologue: Embodied Reasoning through Planning with Language Models

Figure 3 for Inner Monologue: Embodied Reasoning through Planning with Language Models

Figure 4 for Inner Monologue: Embodied Reasoning through Planning with Language Models

Abstract:Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to the language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them - answers that change over time in response to the agent's own choices. In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios. We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction. We find that closed-loop language feedback significantly improves high-level instruction completion on three domains, including simulated and real table top rearrangement tasks and long-horizon mobile manipulation tasks in a kitchen environment in the real world.

* Project website: https://innermonologue.github.io

Via

Access Paper or Ask Questions