Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Scott Niekum

Distributional Depth-Based Estimation of Object Articulation Models

Aug 12, 2021

Ajinkya Jain, Stephen Giguere, Rudolf Lioutikov, Scott Niekum

Figure 1 for Distributional Depth-Based Estimation of Object Articulation Models

Figure 2 for Distributional Depth-Based Estimation of Object Articulation Models

Figure 3 for Distributional Depth-Based Estimation of Object Articulation Models

Figure 4 for Distributional Depth-Based Estimation of Object Articulation Models

Abstract:We propose a method that efficiently learns distributions over articulation model parameters directly from depth images without the need to know articulation model categories a priori. By contrast, existing methods that learn articulation models from raw observations typically only predict point estimates of the model parameters, which are insufficient to guarantee the safe manipulation of articulated objects. Our core contributions include a novel representation for distributions over rigid body transformations and articulation model parameters based on screw theory, von Mises-Fisher distributions, and Stiefel manifolds. Combining these concepts allows for an efficient, mathematically sound representation that implicitly satisfies the constraints that rigid body transformations and articulations must adhere to. Leveraging this representation, we introduce a novel deep learning based approach, DUST-net, that performs category-independent articulation model estimation while also providing model uncertainties. We evaluate our approach on several benchmarking datasets and real-world objects and compare its performance with two current state-of-the-art methods. Our results demonstrate that DUST-net can successfully learn distributions over articulation models for novel objects across articulation model categories, which generate point estimates with better accuracy than state-of-the-art methods and effectively capture the uncertainty over predicted model parameters due to noisy inputs.

* 16 pages, 11 figures, 4 tables

Via

Access Paper or Ask Questions

Robust Generative Adversarial Imitation Learning via Local Lipschitzness

Jun 30, 2021

Farzan Memarian, Abolfazl Hashemi, Scott Niekum, Ufuk Topcu

Figure 1 for Robust Generative Adversarial Imitation Learning via Local Lipschitzness

Figure 2 for Robust Generative Adversarial Imitation Learning via Local Lipschitzness

Abstract:We explore methodologies to improve the robustness of generative adversarial imitation learning (GAIL) algorithms to observation noise. Towards this objective, we study the effect of local Lipschitzness of the discriminator and the generator on the robustness of policies learned by GAIL. In many robotics applications, the learned policies by GAIL typically suffer from a degraded performance at test time since the observations from the environment might be corrupted by noise. Hence, robustifying the learned policies against the observation noise is of critical importance. To this end, we propose a regularization method to induce local Lipschitzness in the generator and the discriminator of adversarial imitation learning methods. We show that the modified objective leads to learning significantly more robust policies. Moreover, we demonstrate -- both theoretically and experimentally -- that training a locally Lipschitz discriminator leads to a locally Lipschitz generator, thereby improving the robustness of the resultant policy. We perform extensive experiments on simulated robot locomotion environments from the MuJoCo suite that demonstrate the proposed method learns policies that significantly outperform the state-of-the-art generative adversarial imitation learning algorithm when applied to test scenarios with noise-corrupted observations.

Via

Access Paper or Ask Questions

Zero-shot Task Adaptation using Natural Language

Jun 05, 2021

Prasoon Goyal, Raymond J. Mooney, Scott Niekum

Figure 1 for Zero-shot Task Adaptation using Natural Language

Figure 2 for Zero-shot Task Adaptation using Natural Language

Figure 3 for Zero-shot Task Adaptation using Natural Language

Figure 4 for Zero-shot Task Adaptation using Natural Language

Abstract:Imitation learning and instruction-following are two common approaches to communicate a user's intent to a learning agent. However, as the complexity of tasks grows, it could be beneficial to use both demonstrations and language to communicate with an agent. In this work, we propose a novel setting where an agent is given both a demonstration and a description, and must combine information from both the modalities. Specifically, given a demonstration for a task (the source task), and a natural language description of the differences between the demonstrated task and a related but different task (the target task), our goal is to train an agent to complete the target task in a zero-shot setting, that is, without any demonstrations for the target task. To this end, we introduce Language-Aided Reward and Value Adaptation (LARVA) which, given a source demonstration and a linguistic description of how the target task differs, learns to output a reward / value function that accurately describes the target task. Our experiments show that on a diverse set of adaptations, our approach is able to complete more than 95% of target tasks when using template-based descriptions, and more than 70% when using free-form natural language.

Via

Access Paper or Ask Questions

Adversarial Intrinsic Motivation for Reinforcement Learning

May 30, 2021

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Figure 1 for Adversarial Intrinsic Motivation for Reinforcement Learning

Figure 2 for Adversarial Intrinsic Motivation for Reinforcement Learning

Figure 3 for Adversarial Intrinsic Motivation for Reinforcement Learning

Figure 4 for Adversarial Intrinsic Motivation for Reinforcement Learning

Abstract:Learning with an objective to minimize the mismatch with a reference distribution has been shown to be useful for generative modeling and imitation learning. In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks. Specifically, this paper focuses on goal-conditioned reinforcement learning where the idealized (unachievable) target distribution has full measure at the goal. We introduce a quasimetric specific to Markov Decision Processes (MDPs), and show that the policy that minimizes the Wasserstein-1 distance of its state visitation distribution to this target distribution under this quasimetric is the policy that reaches the goal in as few steps as possible. Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function. Our experiments show that this reward function changes smoothly with respect to transitions in the MDP and assists the agent in learning. Additionally, we combine AIM with Hindsight Experience Replay (HER) and show that the resulting algorithm accelerates learning significantly on several simulated robotics tasks when compared to HER with a sparse positive reward at the goal state.

Via

Access Paper or Ask Questions

Universal Off-Policy Evaluation

Apr 26, 2021

Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

Figure 1 for Universal Off-Policy Evaluation

Figure 2 for Universal Off-Policy Evaluation

Figure 3 for Universal Off-Policy Evaluation

Figure 4 for Universal Off-Policy Evaluation

Abstract:When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO) -- one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss Uno's applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.

Via

Access Paper or Ask Questions

Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Mar 08, 2021

Farzan Memarian, Wonjoon Goo, Rudolf Lioutikov, Ufuk Topcu, Scott Niekum

Figure 1 for Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Abstract:We propose a novel reinforcement learning framework that performs self-supervised online reward shaping, yielding faster, sample efficient performance in sparse reward environments. The proposed framework alternates between updating a policy and inferring a reward function. While the policy update is done with the inferred, potentially dense reward function, the original sparse reward is used to provide a self-supervisory signal for the reward update by serving as an ordering over the observed trajectories. The proposed framework is based on the theory that altering the reward function does not affect the optimal policy of the original MDP as long as we maintain certain relations between the altered and the original reward. We name the proposed framework \textit{ClAssification-based REward Shaping} (CaReS), since we learn the altered reward in a self-supervised manner using classifier based reward inference. Experimental results on several sparse-reward environments demonstrate that the proposed algorithm is not only significantly more sample efficient than the state-of-the-art baseline, but also achieves a similar sample efficiency to MDPs that use hand-designed dense reward functions.

Via

Access Paper or Ask Questions

SCAPE: Learning Stiffness Control from Augmented Position Control Experiences

Feb 16, 2021

Mincheol Kim, Scott Niekum, Ashish D. Deshpande

Figure 1 for SCAPE: Learning Stiffness Control from Augmented Position Control Experiences

Figure 2 for SCAPE: Learning Stiffness Control from Augmented Position Control Experiences

Figure 3 for SCAPE: Learning Stiffness Control from Augmented Position Control Experiences

Figure 4 for SCAPE: Learning Stiffness Control from Augmented Position Control Experiences

Abstract:We introduce a sample-efficient method for learning state-dependent stiffness control policies for dexterous manipulation. The ability to control stiffness facilitates safe and reliable manipulation by providing compliance and robustness to uncertainties. So far, most current reinforcement learning approaches to achieve robotic manipulation have exclusively focused on position control, often due to the difficulty of learning high-dimensional stiffness control policies. This difficulty can be partially mitigated via policy guidance such as in imitation learning. However, expert stiffness control demonstrations are often expensive or infeasible to record. Therefore, we present an approach to learn Stiffness Control from Augmented Position control Experiences (SCAPE) that bypasses this difficulty by transforming position control demonstrations into approximate, suboptimal stiffness control demonstrations. Then, the suboptimality of the augmented demonstrations is addressed by using complementary techniques that help the agent safely learn from both the demonstrations and reinforcement learning. By using simulation tools and experiments on a robotic testbed, we show that the proposed approach efficiently learns safe manipulation policies and outperforms learned position control policies and several other baseline learning algorithms.

* Submitted to RSS 2021

Via

Access Paper or Ask Questions

Value Alignment Verification

Dec 02, 2020

Daniel S. Brown, Jordan Schneider, Scott Niekum

Figure 1 for Value Alignment Verification

Figure 2 for Value Alignment Verification

Figure 3 for Value Alignment Verification

Figure 4 for Value Alignment Verification

Abstract:As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important that humans can verify these agents' trustworthiness and efficiently evaluate their performance and correctness. In this paper we formalize the problem of value alignment verification: how to efficiently test whether the goals and behavior of another agent are aligned with a human's values? We explore several different value alignment verification settings and provide foundational theory regarding value alignment verification. We study alignment verification problems with an idealized human that has an explicit reward function as well as value alignment verification problems where the human has implicit values. Our theoretical and empirical results in both a discrete grid navigation domain and a continuous autonomous driving domain demonstrate that it is possible to synthesize highly efficient and accurate value alignment verification tests for certifying the alignment of autonomous agents.

* NeurIPS 2020 Workshop on Human And Machine in-the-Loop Evaluation and Learning Strategies (HAMLETS)

Via

Access Paper or Ask Questions

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

Sep 28, 2020

Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox

Figure 1 for The EMPATHIC Framework for Task Learning from Implicit Human Feedback

Figure 2 for The EMPATHIC Framework for Task Learning from Implicit Human Feedback

Figure 3 for The EMPATHIC Framework for Task Learning from Implicit Human Feedback

Figure 4 for The EMPATHIC Framework for Task Learning from Implicit Human Feedback

Abstract:Reactions such as gestures, facial expressions, and vocalizations are an abundant, naturally occurring channel of information that humans provide during interactions. A robot or other agent could leverage an understanding of such implicit human feedback to improve its task performance at no cost to the human. This approach contrasts with common agent teaching methods based on demonstrations, critiques, or other guidance that need to be attentively and intentionally provided. In this paper, we first define the general problem of learning from implicit human feedback and then propose to address this problem through a novel data-driven framework, EMPATHIC. This two-stage method consists of (1) mapping implicit human feedback to relevant task statistics such as rewards, optimality, and advantage; and (2) using such a mapping to learn a task. We instantiate the first stage and three second-stage evaluations of the learned mapping. To do so, we collect a dataset of human facial reactions while participants observe an agent execute a sub-optimal policy for a prescribed training task. We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.

* 8 pages main paper, 2 pages reference, 12 pages appendix

Via

Access Paper or Ask Questions

ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory

Aug 24, 2020

Ajinkya Jain, Rudolf Lioutikov, Scott Niekum

Figure 1 for ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory

Figure 2 for ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory

Figure 3 for ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory

Figure 4 for ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory

Abstract:Robots in human environments will need to interact with a wide variety of articulated objects such as cabinets, drawers, and dishwashers while assisting humans in performing day-to-day tasks. Existing methods either require objects to be textured or need to know the articulation model category a priori for estimating the model parameters for an articulated object. We propose ScrewNet, a novel approach that estimates an object's articulation model directly from depth images without requiring a priori knowledge of the articulation model category. ScrewNet uses screw theory to unify the representation of different articulation types and perform category-independent articulation model estimation. We evaluate our approach on two benchmarking datasets and compare its performance with a current state-of-the-art method. Results demonstrate that ScrewNet can successfully estimate the articulation models and their parameters for novel objects across articulation model categories with better on average accuracy than the prior state-of-the-art method.

Via

Access Paper or Ask Questions