Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Amy Zhang

Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement

Mar 20, 2023
Michael Chang, Alyssa L. Dayan, Franziska Meier, Thomas L. Griffiths, Sergey Levine, Amy Zhang

Figure 1 for Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement

Figure 2 for Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement

Figure 3 for Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement

Figure 4 for Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement

Object rearrangement is a challenge for embodied agents because solving these tasks requires generalizing across a combinatorially large set of configurations of entities and their locations. Worse, the representations of these entities are unknown and must be inferred from sensory percepts. We present a hierarchical abstraction approach to uncover these underlying entities and achieve combinatorial generalization from unstructured visual inputs. By constructing a factorized transition graph over clusters of entity representations inferred from pixels, we show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects, which outperforms current offline deep RL methods when evaluated on simulated rearrangement tasks.

* 19 pages, 11 figures, Published as a conference paper at the International Conference on Learning Representations 2023

Via

Access Paper or Ask Questions

Confidence-aware 3D Gaze Estimation and Evaluation Metric

Mar 17, 2023
Qiaojie Zheng, Jiucai Zhang, Amy Zhang, Xiaoli Zhang

Figure 1 for Confidence-aware 3D Gaze Estimation and Evaluation Metric

Figure 2 for Confidence-aware 3D Gaze Estimation and Evaluation Metric

Figure 3 for Confidence-aware 3D Gaze Estimation and Evaluation Metric

Figure 4 for Confidence-aware 3D Gaze Estimation and Evaluation Metric

Deep learning appearance-based 3D gaze estimation is gaining popularity due to its minimal hardware requirements and being free of constraint. Unreliable and overconfident inferences, however, still limit the adoption of this gaze estimation method. To address the unreliable and overconfident issues, we introduce a confidence-aware model that predicts uncertainties together with gaze angle estimations. We also introduce a novel effectiveness evaluation method based on the causality between eye feature degradation and the rise in inference uncertainty to assess the uncertainty estimation. Our confidence-aware model demonstrates reliable uncertainty estimations while providing angular estimation accuracies on par with the state-of-the-art. Compared with the existing statistical uncertainty-angular-error evaluation metric, the proposed effectiveness evaluation approach can more effectively judge inferred uncertainties' performance at each prediction.

* 9 pages 12 figures

Via

Access Paper or Ask Questions

Imitation from Arbitrary Experience: A Dual Unification of Reinforcement and Imitation Learning Methods

Feb 16, 2023
Harshit Sikchi, Amy Zhang, Scott Niekum

Figure 1 for Imitation from Arbitrary Experience: A Dual Unification of Reinforcement and Imitation Learning Methods

Figure 2 for Imitation from Arbitrary Experience: A Dual Unification of Reinforcement and Imitation Learning Methods

Figure 3 for Imitation from Arbitrary Experience: A Dual Unification of Reinforcement and Imitation Learning Methods

Figure 4 for Imitation from Arbitrary Experience: A Dual Unification of Reinforcement and Imitation Learning Methods

It is well known that Reinforcement Learning (RL) can be formulated as a convex program with linear constraints. The dual form of this formulation is unconstrained, which we refer to as dual RL, and can leverage preexisting tools from convex optimization to improve the learning performance of RL agents. We show that several state-of-the-art deep RL algorithms (in online, offline, and imitation settings) can be viewed as dual RL approaches in a unified framework. This unification calls for the methods to be studied on common ground, so as to identify the components that actually contribute to the success of these methods. Our unification also reveals that prior off-policy imitation learning methods in the dual space are based on an unrealistic coverage assumption and are restricted to matching a particular f-divergence. We propose a new method using a simple modification to the dual framework that allows for imitation learning with arbitrary off-policy data to obtain near-expert performance.

Via

Access Paper or Ask Questions

Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability

Feb 07, 2023
Hanlin Zhu, Amy Zhang

Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills which aim to reach diverse goals. In particular, offline GCRL only requires purely pre-collected datasets to perform training tasks without additional interactions with the environment. Although offline GCRL has become increasingly prevalent and many previous works have demonstrated its empirical success, the theoretical understanding of efficient offline GCRL algorithms is not well established, especially when the state space is huge and the offline dataset only covers the policy we aim to learn. In this paper, we propose a novel provably efficient algorithm (the sample complexity is $\tilde{O}({\rm poly}(1/\epsilon))$ where $\epsilon$ is the desired suboptimality of the learned policy) with general function approximation. Our algorithm only requires nearly minimal assumptions of the dataset (single-policy concentrability) and the function class (realizability). Moreover, our algorithm consists of two uninterleaved optimization steps, which we refer to as $V$-learning and policy learning, and is computationally stable since it does not involve minimax optimization. To the best of our knowledge, this is the first algorithm with general function approximation and single-policy concentrability that is both statistically efficient and computationally stable.

* 26 pages

Via

Access Paper or Ask Questions

Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning

Dec 21, 2022
Chris Lengerich, Gabriel Synnaeve, Amy Zhang, Hugh Leather, Kurt Shuster, François Charton, Charysse Redwood

Figure 1 for Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning

Figure 2 for Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning

Traditional approaches to RL have focused on learning decision policies directly from episodic decisions, while slowly and implicitly learning the semantics of compositional representations needed for generalization. While some approaches have been adopted to refine representations via auxiliary self-supervised losses while simultaneously learning decision policies, learning compositional representations from hand-designed and context-independent self-supervised losses (multi-view) still adapts relatively slowly to the real world, which contains many non-IID subspaces requiring rapid distribution shift in both time and spatial attention patterns at varying levels of abstraction. In contrast, supervised language model cascades have shown the flexibility to adapt to many diverse manifolds, and hints of self-learning needed for autonomous task transfer. However, to date, transfer methods for language models like few-shot learning and fine-tuning still require human supervision and transfer learning using self-learning methods has been underexplored. We propose a self-supervised loss policy called contrastive distillation which manifests latent variables with high mutual information with both source and target tasks from weights to tokens. We show how this outperforms common methods of transfer learning and suggests a useful design axis of trading off compute for generalizability for online transfer. Contrastive distillation is improved through sampling from memory and suggests a simple algorithm for more efficiently sampling negative examples for contrastive losses than random sampling.

Via

Access Paper or Ask Questions

LAD: Language Augmented Diffusion for Reinforcement Learning

Oct 27, 2022
Edwin Zhang, Yujie Lu, William Wang, Amy Zhang

Figure 1 for LAD: Language Augmented Diffusion for Reinforcement Learning

Figure 2 for LAD: Language Augmented Diffusion for Reinforcement Learning

Figure 3 for LAD: Language Augmented Diffusion for Reinforcement Learning

Figure 4 for LAD: Language Augmented Diffusion for Reinforcement Learning

Learning skills from language provides a powerful avenue for generalization in reinforcement learning, although it remains a challenging task as it requires agents to capture the complex interdependencies between language, actions, and states. In this paper, we propose leveraging Language Augmented Diffusion models as a planner conditioned on language (LAD). We demonstrate the comparable performance of LAD with the state-of-the-art on the CALVIN language robotics benchmark with a much simpler architecture that contains no inductive biases specialized to robotics, achieving an average success rate (SR) of 72% compared to the best performance of 76%. We also conduct an analysis on the properties of language conditioned diffusion in reinforcement learning.

* Language and Reinforcement Learning Workshop, NeurIPS 2022

Via

Access Paper or Ask Questions

Latent State Marginalization as a Low-cost Approach for Improving Exploration

Oct 03, 2022
Dinghuai Zhang, Aaron Courville, Yoshua Bengio, Qinqing Zheng, Amy Zhang, Ricky T. Q. Chen

Figure 1 for Latent State Marginalization as a Low-cost Approach for Improving Exploration

Figure 2 for Latent State Marginalization as a Low-cost Approach for Improving Exploration

Figure 3 for Latent State Marginalization as a Low-cost Approach for Improving Exploration

Figure 4 for Latent State Marginalization as a Low-cost Approach for Improving Exploration

While the maximum entropy (MaxEnt) reinforcement learning (RL) framework -- often touted for its exploration and robustness capabilities -- is usually motivated from a probabilistic perspective, the use of deep probabilistic models has not gained much traction in practice due to their inherent complexity. In this work, we propose the adoption of latent variable policies within the MaxEnt framework, which we show can provably approximate any policy distribution, and additionally, naturally emerges under the use of world models with a latent belief state. We discuss why latent variable policies are difficult to train, how naive approaches can fail, then subsequently introduce a series of improvements centered around low-cost marginalization of the latent state, allowing us to make full use of the latent state at minimal additional cost. We instantiate our method under the actor-critic framework, marginalizing both the actor and critic. The resulting algorithm, referred to as Stochastic Marginal Actor-Critic (SMAC), is simple yet effective. We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training.

Via

Access Paper or Ask Questions

VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

Sep 30, 2022
Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, Amy Zhang

Figure 1 for VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

Figure 2 for VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

Figure 3 for VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

Figure 4 for VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

Reward and representation learning are two long-standing challenges for learning an expanding set of robot manipulation skills from sensory observations. Given the inherent cost and scarcity of in-domain, task-specific robot data, learning from large, diverse, offline human videos has emerged as a promising path towards acquiring a generally useful visual representation for control; however, how these human videos can be used for general-purpose reward learning remains an open question. We introduce $\textbf{V}$alue-$\textbf{I}$mplicit $\textbf{P}$re-training (VIP), a self-supervised pre-trained visual representation capable of generating dense and smooth reward functions for unseen robotic tasks. VIP casts representation learning from human videos as an offline goal-conditioned reinforcement learning problem and derives a self-supervised dual goal-conditioned value-function objective that does not depend on actions, enabling pre-training on unlabeled human videos. Theoretically, VIP can be understood as a novel implicit time contrastive objective that generates a temporally smooth embedding, enabling the value function to be implicitly defined via the embedding distance, which can then be used to construct the reward for any goal-image specified downstream task. Trained on large-scale Ego4D human videos and without any fine-tuning on in-domain, task-specific data, VIP's frozen representation can provide dense visual reward for an extensive set of simulated and $\textbf{real-robot}$ tasks, enabling diverse reward-based visual control methods and significantly outperforming all prior pre-trained representations. Notably, VIP can enable simple, $\textbf{few-shot}$ offline RL on a suite of real-world robot tasks with as few as 20 trajectories.

* Project website: https://sites.google.com/view/vip-rl

Via

Access Paper or Ask Questions

Building Human Values into Recommender Systems: An Interdisciplinary Synthesis

Jul 20, 2022
Jonathan Stray, Alon Halevy, Parisa Assar, Dylan Hadfield-Menell, Craig Boutilier, Amar Ashar, Lex Beattie, Michael Ekstrand, Claire Leibowicz, Connie Moon Sehat, Sara Johansen, Lianne Kerlin, David Vickrey, Spandana Singh, Sanne Vrijenhoek, Amy Zhang, McKane Andrus, Natali Helberger, Polina Proutskova, Tanushree Mitra, Nina Vasan

Figure 1 for Building Human Values into Recommender Systems: An Interdisciplinary Synthesis

Figure 2 for Building Human Values into Recommender Systems: An Interdisciplinary Synthesis

Figure 3 for Building Human Values into Recommender Systems: An Interdisciplinary Synthesis

Figure 4 for Building Human Values into Recommender Systems: An Interdisciplinary Synthesis

Recommender systems are the algorithms which select, filter, and personalize content across many of the worlds largest platforms and apps. As such, their positive and negative effects on individuals and on societies have been extensively theorized and studied. Our overarching question is how to ensure that recommender systems enact the values of the individuals and societies that they serve. Addressing this question in a principled fashion requires technical knowledge of recommender design and operation, and also critically depends on insights from diverse fields including social science, ethics, economics, psychology, policy and law. This paper is a multidisciplinary effort to synthesize theory and practice from different perspectives, with the goal of providing a shared language, articulating current design approaches, and identifying open problems. It is not a comprehensive survey of this large space, but a set of highlights identified by our diverse author cohort. We collect a set of values that seem most relevant to recommender systems operating across different domains, then examine them from the perspectives of current industry practice, measurement, product design, and policy approaches. Important open problems include multi-stakeholder processes for defining values and resolving trade-offs, better values-driven measurements, recommender controls that people use, non-behavioral algorithmic feedback, optimization for long-term outcomes, causal inference of recommender effects, academic-industry research collaborations, and interdisciplinary policy-making.

Via

Access Paper or Ask Questions

Denoised MDPs: Learning World Models Better Than the World Itself

Jul 18, 2022
Tongzhou Wang, Simon S. Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian

Figure 1 for Denoised MDPs: Learning World Models Better Than the World Itself

Figure 2 for Denoised MDPs: Learning World Models Better Than the World Itself

Figure 3 for Denoised MDPs: Learning World Models Better Than the World Itself

Figure 4 for Denoised MDPs: Learning World Models Better Than the World Itself

The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence. With this ability, humans can efficiently perform real world tasks without considering all possible nuisance factors.How can artificial agents do the same? What kind of information can agents safely discard as noises? In this work, we categorize information out in the wild into four types based on controllability and relation with reward, and formulate useful information as that which is both controllable and reward-relevant. This framework clarifies the kinds information removed by various prior work on representation learning in reinforcement learning (RL), and leads to our proposed approach of learning a Denoised MDP that explicitly factors out certain noise distractors. Extensive experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone, and over prior works, across policy optimization control tasks as well as the non-control task of joint position regression.

* Project page: https://ssnl.github.io/denoised_mdp/ Code: https://github.com/facebookresearch/denoised_mdp

Via

Access Paper or Ask Questions