Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Animesh Garg

SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments

Jun 17, 2022

Mohan Zhang, Xiaozhou Wang, Benjamin Decardi-Nelson, Bo Song, An Zhang, Jinfeng Liu, Sile Tao, Jiayi Cheng, Xiaohong Liu, DengDeng Yu(+2 more)

Figure 1 for SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments

Figure 2 for SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments

Figure 3 for SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments

Figure 4 for SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments

Abstract:Traditional biological and pharmaceutical manufacturing plants are controlled by human workers or pre-defined thresholds. Modernized factories have advanced process control algorithms such as model predictive control (MPC). However, there is little exploration of applying deep reinforcement learning to control manufacturing plants. One of the reasons is the lack of high fidelity simulations and standard APIs for benchmarking. To bridge this gap, we develop an easy-to-use library that includes five high-fidelity simulation environments: BeerFMTEnv, ReactorEnv, AtropineEnv, PenSimEnv and mAbEnv, which cover a wide range of manufacturing processes. We build these environments on published dynamics models. Furthermore, we benchmark online and offline, model-based and model-free reinforcement learning algorithms for comparisons of follow-up research.

* Neurips 2022 Preprint. Under review

Via

Access Paper or Ask Questions

Neural Shape Mating: Self-Supervised Object Assembly with Adversarial Shape Priors

May 30, 2022

Yun-Chun Chen, Haoda Li, Dylan Turpin, Alec Jacobson, Animesh Garg

Abstract:Learning to autonomously assemble shapes is a crucial skill for many robotic applications. While the majority of existing part assembly methods focus on correctly posing semantic parts to recreate a whole object, we interpret assembly more literally: as mating geometric parts together to achieve a snug fit. By focusing on shape alignment rather than semantic cues, we can achieve across-category generalization. In this paper, we introduce a novel task, pairwise 3D geometric shape mating, and propose Neural Shape Mating (NSM) to tackle this problem. Given the point clouds of two object parts of an unknown category, NSM learns to reason about the fit of the two parts and predict a pair of 3D poses that tightly mate them together. We couple the training of NSM with an implicit shape reconstruction task to make NSM more robust to imperfect point cloud observations. To train NSM, we present a self-supervised data collection pipeline that generates pairwise shape mating data with ground truth by randomly cutting an object mesh into two parts, resulting in a dataset that consists of 200K shape mating pairs from numerous object meshes with diverse cut types. We train NSM on the collected dataset and compare it with several point cloud registration methods and one part assembly baseline. Extensive experimental results and ablation studies under various settings demonstrate the effectiveness of the proposed algorithm. Additional material is available at: https://neural-shape-mating.github.io/

* CVPR 2022

Via

Access Paper or Ask Questions

Accelerated Policy Learning with Parallel Differentiable Simulation

Apr 14, 2022

Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech Matusik, Animesh Garg, Miles Macklin

Figure 1 for Accelerated Policy Learning with Parallel Differentiable Simulation

Figure 2 for Accelerated Policy Learning with Parallel Differentiable Simulation

Figure 3 for Accelerated Policy Learning with Parallel Differentiable Simulation

Figure 4 for Accelerated Policy Learning with Parallel Differentiable Simulation

Abstract:Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work we present a high-performance differentiable simulator and a new policy learning algorithm (SHAC) that can effectively leverage simulation gradients, even in the presence of non-smoothness. Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments to be run in parallel. We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17x reduction in training time over the best-performing established RL algorithm.

* International Conference on Learning Representations

Via

Access Paper or Ask Questions

Value Gradient weighted Model-Based Reinforcement Learning

Apr 04, 2022

Claas Voelcker, Victor Liao, Animesh Garg, Amir-massoud Farahmand

Figure 1 for Value Gradient weighted Model-Based Reinforcement Learning

Figure 2 for Value Gradient weighted Model-Based Reinforcement Learning

Figure 3 for Value Gradient weighted Model-Based Reinforcement Learning

Figure 4 for Value Gradient weighted Model-Based Reinforcement Learning

Abstract:Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies, yet unavoidable modeling errors often lead performance deterioration. The model in MBRL is often solely fitted to reconstruct dynamics, state observations in particular, while the impact of model error on the policy is not captured by the training objective. This leads to a mismatch between the intended goal of MBRL, enabling good policy and value learning, and the target of the loss function employed in practice, future state prediction. Naive intuition would suggest that value-aware model learning would fix this problem and, indeed, several solutions to this objective mismatch problem have been proposed based on theoretical analysis. However, they tend to be inferior in practice to commonly used maximum likelihood (MLE) based approaches. In this paper we propose the Value-gradient weighted Model Learning (VaGraM), a novel method for value-aware model learning which improves the performance of MBRL in challenging settings, such as small model capacity and the presence of distracting state dimensions. We analyze both MLE and value-aware approaches and demonstrate how they fail to account for exploration and the behavior of function approximation when learning value-aware models and highlight the additional goals that must be met to stabilize optimization in the deep learning setting. We verify our analysis by showing that our loss function is able to achieve high returns on the Mujoco benchmark suite while being more robust than maximum likelihood based approaches.

Via

Access Paper or Ask Questions

X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Mar 28, 2022

Satya Krishna Gorti, Noel Vouitsis, Junwei Ma, Keyvan Golestan, Maksims Volkovs, Animesh Garg, Guangwei Yu

Figure 1 for X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Figure 2 for X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Figure 3 for X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Figure 4 for X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Abstract:In text-video retrieval, the objective is to learn a cross-modal similarity function between a text and a video that ranks relevant text-video pairs higher than irrelevant pairs. However, videos inherently express a much wider gamut of information than texts. Instead, texts often capture sub-regions of entire videos and are most semantically similar to certain frames within videos. Therefore, for a given text, a retrieval model should focus on the text's most semantically similar video sub-regions to make a more relevant comparison. Yet, most existing works aggregate entire videos without directly considering text. Common text-agnostic aggregations schemes include mean-pooling or self-attention over the frames, but these are likely to encode misleading visual information not described in the given text. To address this, we propose a cross-modal attention model called X-Pool that reasons between a text and the frames of a video. Our core mechanism is a scaled dot product attention for a text to attend to its most semantically similar frames. We then generate an aggregated video representation conditioned on the text's attention weights over the frames. We evaluate our method on three benchmark datasets of MSR-VTT, MSVD and LSMDC, achieving new state-of-the-art results by up to 12% in relative improvement in Recall@1. Our findings thereby highlight the importance of joint text-video reasoning to extract important visual cues according to text. Full code and demo can be found at: https://layer6ai-labs.github.io/xpool/

* CVPR 2022

Via

Access Paper or Ask Questions

DiSECt: A Differentiable Simulator for Parameter Inference and Control in Robotic Cutting

Mar 19, 2022

Eric Heiden, Miles Macklin, Yashraj Narang, Dieter Fox, Animesh Garg, Fabio Ramos

Figure 1 for DiSECt: A Differentiable Simulator for Parameter Inference and Control in Robotic Cutting

Figure 2 for DiSECt: A Differentiable Simulator for Parameter Inference and Control in Robotic Cutting

Figure 3 for DiSECt: A Differentiable Simulator for Parameter Inference and Control in Robotic Cutting

Figure 4 for DiSECt: A Differentiable Simulator for Parameter Inference and Control in Robotic Cutting

Abstract:Robotic cutting of soft materials is critical for applications such as food processing, household automation, and surgical manipulation. As in other areas of robotics, simulators can facilitate controller verification, policy learning, and dataset generation. Moreover, differentiable simulators can enable gradient-based optimization, which is invaluable for calibrating simulation parameters and optimizing controllers. In this work, we present DiSECt: the first differentiable simulator for cutting soft materials. The simulator augments the finite element method (FEM) with a continuous contact model based on signed distance fields (SDF), as well as a continuous damage model that inserts springs on opposite sides of the cutting plane and allows them to weaken until zero stiffness, enabling crack formation. Through various experiments, we evaluate the performance of the simulator. We first show that the simulator can be calibrated to match resultant forces and deformation fields from a state-of-the-art commercial solver and real-world cutting datasets, with generality across cutting velocities and object instances. We then show that Bayesian inference can be performed efficiently by leveraging the differentiability of the simulator, estimating posteriors over hundreds of parameters in a fraction of the time of derivative-free methods. Next, we illustrate that control parameters in the simulation can be optimized to minimize cutting forces via lateral slicing motions. Finally, we conduct experiments on a real robot arm equipped with a slicing knife to infer simulation parameters from force measurements. By optimizing the slicing motion of the knife, we show on fruit cutting scenarios that the average knife force can be reduced by more than 40% compared to a vertical cutting motion. We publish code and additional materials on our project website at https://diff-cutting-sim.github.io.

* Extended version of our RSS 2021 paper arXiv:2105.12244

Via

Access Paper or Ask Questions

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Feb 23, 2022

Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg, Peng Liu, Zhaoran Wang

Figure 1 for Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Figure 2 for Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Figure 3 for Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Figure 4 for Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Abstract:Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment. Directly applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions. Previous methods tackle such problem by penalizing the Q-values of OOD actions or constraining the trained policy to be close to the behavior policy. Nevertheless, such methods typically prevent the generalization of value functions beyond the offline data and also lack precise characterization of OOD data. In this paper, we propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints. Specifically, PBRL conducts uncertainty quantification via the disagreement of bootstrapped Q-functions, and performs pessimistic updates by penalizing the value function based on the estimated uncertainty. To tackle the extrapolating error, we further propose a novel OOD sampling method. We show that such OOD sampling and pessimistic bootstrapping yields provable uncertainty quantifier in linear MDPs, thus providing the theoretical underpinning for PBRL. Extensive experiments on D4RL benchmark show that PBRL has better performance compared to the state-of-the-art algorithms.

* ICLR 2022

Via

Access Paper or Ask Questions

Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics

Nov 02, 2021

Matthias Weissenbacher, Samarth Sinha, Animesh Garg, Yoshinobu Kawahara

Figure 1 for Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics

Figure 2 for Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics

Figure 3 for Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics

Figure 4 for Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics

Abstract:Offline reinforcement learning leverages large datasets to train policies without interactions with the environment. The learned policies may then be deployed in real-world settings where interactions are costly or dangerous. Current algorithms over-fit to the training dataset and as a consequence perform poorly when deployed to out-of-distribution generalizations of the environment. We aim to address these limitations by learning a Koopman latent representation which allows us to infer symmetries of the system's underlying dynamic. The latter is then utilized to extend the otherwise static offline dataset during training; this constitutes a novel data augmentation framework which reflects the system's dynamic and is thus to be interpreted as an exploration of the environments phase space. To obtain the symmetries we employ Koopman theory in which nonlinear dynamics are represented in terms of a linear operator acting on the space of measurement functions of the system and thus symmetries of the dynamics may be inferred directly. We provide novel theoretical results on the existence and nature of symmetries relevant for control systems such as reinforcement learning settings. Moreover, we empirically evaluate our method on several benchmark offline reinforcement learning tasks and datasets including D4RL, Metaworld and Robosuite and find that by using our framework we consistently improve the state-of-the-art for Q-learning methods.

Via

Access Paper or Ask Questions

Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Oct 30, 2021

Matthew Shunshi Zhang, Murat Erdogdu, Animesh Garg

Figure 1 for Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Figure 2 for Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Figure 3 for Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Figure 4 for Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Abstract:Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In particular, existing rates are achieved in limited settings, under strict smoothness and bounded conditions. In this work, we establish explicit convergence rates of policy gradient methods without relying on these conditions, instead extending the convergence regime to weakly smooth policy classes with $L_2$ integrable gradient. We provide intuitive examples to illustrate the insight behind these new conditions. We also characterize the sufficiency conditions for the ergodicity of near-linear MDPs, which represent an important class of problems. Notably, our analysis also shows that fast convergence rates are achievable for both the standard policy gradient and the natural policy gradient algorithms under these assumptions. Lastly we provide conditions and analysis for optimality of the converged policies.

Via

Access Paper or Ask Questions

Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

Oct 27, 2021

Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar

Figure 1 for Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

Figure 2 for Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

Figure 3 for Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

Figure 4 for Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

Abstract:We present an extended abstract for the previously published work TESSERACT [Mahajan et al., 2021], which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions. The goal of this abstract is twofold: (1) To garner greater interest amongst the tensor research community for creating methods and analysis for approximate RL, (2) To elucidate the generalised setting of factored action spaces where tensor decompositions can be used. We use cooperative multi-agent reinforcement learning scenario as the exemplary setting where the action space is naturally factored across agents and learning becomes intractable without resorting to approximation on the underlying hypothesis space for candidate solutions.

* 2nd Workshop on Quantum Tensor Networks in Machine Learning (NeurIPS 2021)

Via

Access Paper or Ask Questions