Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

George Konidaris

MIT

Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings

Jul 28, 2021

Sreehari Rammohan, Shangqun Yu, Bowen He, Eric Hsiung, Eric Rosen, Stefanie Tellex, George Konidaris

Figure 1 for Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings

Figure 2 for Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings

Abstract:Learning continuous control in high-dimensional sparse reward settings, such as robotic manipulation, is a challenging problem due to the number of samples often required to obtain accurate optimal value and policy estimates. While many deep reinforcement learning methods have aimed at improving sample efficiency through replay or improved exploration techniques, state of the art actor-critic and policy gradient methods still suffer from the hard exploration problem in sparse reward settings. Motivated by recent successes of value-based methods for approximating state-action values, like RBF-DQN, we explore the potential of value-based reinforcement learning for learning continuous robotic manipulation tasks in multi-task sparse reward settings. On robotic manipulation tasks, we empirically show RBF-DQN converges faster than current state of the art algorithms such as TD3, SAC, and PPO. We also perform ablation studies with RBF-DQN and have shown that some enhancement techniques for vanilla Deep Q learning such as Hindsight Experience Replay (HER) and Prioritized Experience Replay (PER) can also be applied to RBF-DQN. Our experimental analysis suggests that value-based approaches may be more sensitive to data augmentation and replay buffer sample techniques than policy-gradient methods, and that the benefits of these methods for robot manipulation are heavily dependent on the transition dynamics of generated subgoal states.

* 5 pages, 2 figures, published at RSS 2021 workshop: Advancing Artificial Intelligence and Manipulation for Robotics: Understanding Gaps, Industry and Academic Perspectives, and Community Building

Via

Access Paper or Ask Questions

Learning Markov State Abstractions for Deep Reinforcement Learning

Jun 08, 2021

Cameron Allen, Neev Parikh, Omer Gottesman, George Konidaris

Figure 1 for Learning Markov State Abstractions for Deep Reinforcement Learning

Figure 2 for Learning Markov State Abstractions for Deep Reinforcement Learning

Figure 3 for Learning Markov State Abstractions for Deep Reinforcement Learning

Figure 4 for Learning Markov State Abstractions for Deep Reinforcement Learning

Abstract:The fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state representation, and such representations are not guaranteed to preserve the Markov property. We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation. We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning to learn an abstraction that approximately satisfies these conditions. Our novel training objective is compatible with both online and offline training: it does not require a reward signal, but agents can capitalize on reward information when available. We empirically evaluate our approach on a visual gridworld domain and a set of continuous control benchmarks. Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency over state-of-the-art deep reinforcement learning with visual features -- often matching or exceeding the performance achieved with hand-designed compact state information.

* Code available at https://github.com/camall3n/markov-state-abstractions

Via

Access Paper or Ask Questions

Learning to Detect Multi-Modal Grasps for Dexterous Grasping in Dense Clutter

Jun 07, 2021

Matt Corsaro, Stefanie Tellex, George Konidaris

Figure 1 for Learning to Detect Multi-Modal Grasps for Dexterous Grasping in Dense Clutter

Figure 2 for Learning to Detect Multi-Modal Grasps for Dexterous Grasping in Dense Clutter

Figure 3 for Learning to Detect Multi-Modal Grasps for Dexterous Grasping in Dense Clutter

Figure 4 for Learning to Detect Multi-Modal Grasps for Dexterous Grasping in Dense Clutter

Abstract:Grasping arbitrary objects in densely cluttered novel environments is a crucial skill for robots. Though many existing systems enable two-finger parallel-jaw grippers to pick items from clutter, these grippers cannot perform multiple types of grasps. However, multi-modal grasping with multi-finger grippers could much more effectively clear objects of varying sizes from cluttered scenes. We propose an approach to multi-model grasp detection that jointly predicts the probabilities that several types of grasps succeed at a given grasp pose. Given a partial point cloud of a scene, the algorithm proposes a set of feasible grasp candidates, then estimates the probabilities that a grasp of each type would succeed at each candidate pose. Predicting grasp success probabilities directly from point clouds makes our approach agnostic to the number and placement of depth sensors at execution time. We evaluate our system both in simulation and on a real robot with a Robotiq 3-Finger Adaptive Gripper. We compare our network against several baselines that perform fewer types of grasps. Our experiments show that a system that explicitly models grasp type achieves an object retrieval rate 8.5% higher in a complex cluttered environment than our highest-performing baseline.

* IROS 2021 submission

Via

Access Paper or Ask Questions

Bootstrapping Motor Skill Learning with Motion Planning

Jan 12, 2021

Ben Abbatematteo, Eric Rosen, Stefanie Tellex, George Konidaris

Figure 1 for Bootstrapping Motor Skill Learning with Motion Planning

Figure 2 for Bootstrapping Motor Skill Learning with Motion Planning

Figure 3 for Bootstrapping Motor Skill Learning with Motion Planning

Figure 4 for Bootstrapping Motor Skill Learning with Motion Planning

Abstract:Learning a robot motor skill from scratch is impractically slow; so much so that in practice, learning must be bootstrapped using a good skill policy obtained from human demonstration. However, relying on human demonstration necessarily degrades the autonomy of robots that must learn a wide variety of skills over their operational lifetimes. We propose using kinematic motion planning as a completely autonomous, sample efficient way to bootstrap motor skill learning for object manipulation. We demonstrate the use of motion planners to bootstrap motor skills in two complex object manipulation scenarios with different policy representations: opening a drawer with a dynamic movement primitive representation, and closing a microwave door with a deep neural network policy. We also show how our method can bootstrap a motor skill for the challenging dynamic task of learning to hit a ball off a tee, where a kinematic plan based on treating the scene as static is insufficient to solve the task, but sufficient to bootstrap a more dynamic policy. In all three cases, our method is competitive with human-demonstrated initialization, and significantly outperforms starting with a random policy. This approach enables robots to to efficiently and autonomously learn motor policies for dynamic tasks without human demonstration.

Via

Access Paper or Ask Questions

Task Scoping: Building Goal-Specific Abstractions for Planning in Complex Domains

Oct 17, 2020

Nishanth Kumar, Michael Fishman, Natasha Danas, Michael Littman, Stefanie Tellex, George Konidaris

Figure 1 for Task Scoping: Building Goal-Specific Abstractions for Planning in Complex Domains

Figure 2 for Task Scoping: Building Goal-Specific Abstractions for Planning in Complex Domains

Figure 3 for Task Scoping: Building Goal-Specific Abstractions for Planning in Complex Domains

Abstract:A generally intelligent agent requires an open-scope world model: one rich enough to tackle any of the wide range of tasks it may be asked to solve over its operational lifetime. Unfortunately, planning to solve any specific task using such a rich model is computationally intractable - even for state-of-the-art methods - due to the many states and actions that are necessarily present in the model but irrelevant to that problem. We propose task scoping: a method that exploits knowledge of the initial condition, goal condition, and transition-dynamics structure of a task to automatically and efficiently prune provably irrelevant factors and actions from a planning problem, which can dramatically decrease planning time. We prove that task scoping never deletes relevant factors or actions, characterize its computational complexity, and characterize the planning problems for which it is especially useful. Finally, we empirically evaluate task scoping on a variety of domains and demonstrate that using it as a pre-planning step can reduce the state-action space of various planning problems by orders of magnitude and speed up planning. When applied to a complex Minecraft domain, our approach speeds up a state-of-the-art planner by 30 times, including the time required for task scoping itself.

Via

Access Paper or Ask Questions

Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion

Jun 04, 2020

Josh Roy, George Konidaris

Figure 1 for Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion

Figure 2 for Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion

Figure 3 for Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion

Figure 4 for Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion

Abstract:We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO), a novel algorithm for visual transfer in Reinforcement Learning that explicitly learns to align the distributions of extracted features between a source and target task. WAPPO approximates and minimizes the Wasserstein-1 distance between the distributions of features from source and target domains via a novel Wasserstein Confusion objective. WAPPO outperforms the prior state-of-the-art in visual transfer and successfully transfers policies across Visual Cartpole and two instantiations of 16 OpenAI Procgen environments.

Via

Access Paper or Ask Questions

Multi-Resolution POMDP Planning for Multi-Object Search in 3D

May 07, 2020

Kaiyu Zheng, Yoonchang Sung, George Konidaris, Stefanie Tellex

Figure 1 for Multi-Resolution POMDP Planning for Multi-Object Search in 3D

Figure 2 for Multi-Resolution POMDP Planning for Multi-Object Search in 3D

Figure 3 for Multi-Resolution POMDP Planning for Multi-Object Search in 3D

Figure 4 for Multi-Resolution POMDP Planning for Multi-Object Search in 3D

Abstract:Robots operating in household environments must find objects on shelves, under tables, and in cupboards. Previous work often formulate the object search problem as a POMDP (Partially Observable Markov Decision Process), yet constrain the search space in 2D. We propose a new approach that enables the robot to efficiently search for objects in 3D, taking occlusions into account. We model the problem as an object-oriented POMDP, where the robot receives a volumetric observation from a viewing frustum and must produce a policy to efficiently search for objects. To address the challenge of large state and observation spaces, we first propose a per-voxel observation model which drastically reduces the observation size necessary for planning. Then, we present a novel octree-based belief representation which captures beliefs at different resolutions and supports efficient exact belief update. Finally, we design an online multi-resolution planning algorithm that leverages the resolution layers in the octree structure as levels of abstractions to the original POMDP problem. Our evaluation in a simulated 3D domain shows that, as the problem scales, our approach significantly outperforms baselines without resolution hierarchy by 25%-35% in cumulative reward. We demonstrate the practicality of our approach on a torso-actuated mobile robot searching for objects in areas of a cluttered lab environment where objects appear on surfaces at different heights.

* 13 pages, 5 figures, 4 tables

Via

Access Paper or Ask Questions

Finding Macro-Actions with Disentangled Effects for Efficient Planning with the Goal-Count Heuristic

Apr 28, 2020

Cameron Allen, Tim Klinger, George Konidaris, Matthew Riemer, Gerald Tesauro

Figure 1 for Finding Macro-Actions with Disentangled Effects for Efficient Planning with the Goal-Count Heuristic

Figure 2 for Finding Macro-Actions with Disentangled Effects for Efficient Planning with the Goal-Count Heuristic

Figure 3 for Finding Macro-Actions with Disentangled Effects for Efficient Planning with the Goal-Count Heuristic

Figure 4 for Finding Macro-Actions with Disentangled Effects for Efficient Planning with the Goal-Count Heuristic

Abstract:The difficulty of classical planning increases exponentially with search-tree depth. Heuristic search can make planning more efficient, but good heuristics often require domain-specific assumptions and may not generalize to new problems. Rather than treating the planning problem as fixed and carefully designing a heuristic to match it, we instead construct macro-actions that support efficient planning with the simple and general-purpose "goal-count" heuristic. Our approach searches for macro-actions that modify only a small number of state variables (we call this measure "entanglement"). We show experimentally that reducing entanglement exponentially decreases planning time with the goal-count heuristic. Our method discovers macro-actions with disentangled effects that dramatically improve planning efficiency for 15-puzzle and Rubik's cube, reliably solving each domain without prior knowledge, and solving Rubik's cube with orders of magnitude less data than competing approaches.

* Code available at https://github.com/camall3n/skills-for-planning

Via

Access Paper or Ask Questions

Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control

Oct 23, 2019

Jonathan Chang, Nishanth Kumar, Sean Hastings, Aaron Gokaslan, Diego Romeres, Devesh Jha, Daniel Nikovski, George Konidaris, Stefanie Tellex

Figure 1 for Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control

Figure 2 for Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control

Figure 3 for Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control

Figure 4 for Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control

Abstract:Robots need to learn skills that can not only generalize across similar problems but also be directed to a specific goal. Previous methods either train a new skill for every different goal or do not infer the specific target in the presence of multiple goals from visual data. We introduce an end-to-end method that represents targetable visuomotor skills as a goal-parameterized neural network policy. By training on an informative subset of available goals with the associated target parameters, we are able to learn a policy that can zero-shot generalize to previously unseen goals. We evaluate our method in a representative 2D simulation of a button-grid and on both button-pressing and peg-insertion tasks on two different physical arms. We demonstrate that our model trained on 33% of the possible goals is able to generalize to more than 90% of the targets in the scene for both simulation and robot experiments. We also successfully learn a mapping from target pixel coordinates to a robot policy to complete a specified goal.

* Preprint. Currently under review. * denotes equal contribution

Via

Access Paper or Ask Questions

A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms

Jul 11, 2019

Oliver Kroemer, Scott Niekum, George Konidaris

Figure 1 for A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms

Figure 2 for A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms

Abstract:A key challenge in intelligent robotics is creating robots that are capable of directly interacting with the world around them to achieve their goals. The last decade has seen substantial growth in research on the problem of robot manipulation, which aims to exploit the increasing availability of affordable robot arms and grippers to create robots capable of directly interacting with the world to achieve their goals. Learning will be central to such autonomous systems, as the real world contains too much variation for a robot to expect to have an accurate model of its environment, the objects in it, or the skills required to manipulate them, in advance. We aim to survey a representative subset of that research which uses machine learning for manipulation. We describe a formalization of the robot manipulation learning problem that synthesizes existing research into a single coherent framework and highlight the many remaining research opportunities and challenges.

Via

Access Paper or Ask Questions