Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aviv Tamar

Technion

Fine-Tuning Generative Models as an Inference Method for Robotic Tasks

Oct 19, 2023

Orr Krupnik, Elisei Shafer, Tom Jurgenson, Aviv Tamar

Figure 1 for Fine-Tuning Generative Models as an Inference Method for Robotic Tasks

Figure 2 for Fine-Tuning Generative Models as an Inference Method for Robotic Tasks

Figure 3 for Fine-Tuning Generative Models as an Inference Method for Robotic Tasks

Figure 4 for Fine-Tuning Generative Models as an Inference Method for Robotic Tasks

Abstract:Adaptable models could greatly benefit robotic agents operating in the real world, allowing them to deal with novel and varying conditions. While approaches such as Bayesian inference are well-studied frameworks for adapting models to evidence, we build on recent advances in deep generative models which have greatly affected many areas of robotics. Harnessing modern GPU acceleration, we investigate how to quickly adapt the sample generation of neural network models to observations in robotic tasks. We propose a simple and general method that is applicable to various deep generative models and robotic environments. The key idea is to quickly fine-tune the model by fitting it to generated samples matching the observed evidence, using the cross-entropy method. We show that our method can be applied to both autoregressive models and variational autoencoders, and demonstrate its usability in object shape inference from grasping, inverse kinematics calculation, and point cloud completion.

* 7th Conference on Robot Learning, 2023. Project website at https://www.orrkrup.com/mace

Via

Access Paper or Ask Questions

TGRL: An Algorithm for Teacher Guided Reinforcement Learning

Jul 06, 2023

Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal

Abstract:Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $\textit{principled}$ approach, along with an approximate implementation for $\textit{dynamically}$ and $\textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $\textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.

* ICML 2023

Via

Access Paper or Ask Questions

DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles

Jun 09, 2023

Tal Daniel, Aviv Tamar

Figure 1 for DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles

Figure 2 for DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles

Figure 3 for DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles

Figure 4 for DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles

Abstract:We propose a new object-centric video prediction algorithm based on the deep latent particle (DLP) representation. In comparison to existing slot- or patch-based representations, DLPs model the scene using a set of keypoints with learned parameters for properties such as position and size, and are both efficient and interpretable. Our method, deep dynamic latent particles (DDLP), yields state-of-the-art object-centric video prediction results on several challenging datasets. The interpretable nature of DDLP allows us to perform ``what-if'' generation -- predict the consequence of changing properties of objects in the initial frames, and DLP's compact structure enables efficient diffusion-based unconditional video generation. Videos, code and pre-trained models are available: https://taldatech.github.io/ddlp-web

* Project site: https://taldatech.github.io/ddlp-web

Via

Access Paper or Ask Questions

Explore to Generalize in Zero-Shot RL

Jun 05, 2023

Ev Zisselman, Itai Lavie, Daniel Soudry, Aviv Tamar

Figure 1 for Explore to Generalize in Zero-Shot RL

Figure 2 for Explore to Generalize in Zero-Shot RL

Figure 3 for Explore to Generalize in Zero-Shot RL

Figure 4 for Explore to Generalize in Zero-Shot RL

Abstract:We study zero-shot generalization in reinforcement learning - optimizing a policy on a set of training tasks such that it will perform well on a similar but unseen test task. To mitigate overfitting, previous work explored different notions of invariance to the task. However, on problems such as the ProcGen Maze, an adequate solution that is invariant to the task visualization does not exist, and therefore invariance-based approaches fail. Our insight is that learning a policy that $\textit{explores}$ the domain effectively is harder to memorize than a policy that maximizes reward for a specific task, and therefore we expect such learned behavior to generalize well; we indeed demonstrate this empirically on several domains that are difficult for invariance-based approaches. Our $\textit{Explore to Generalize}$ algorithm (ExpGen) builds on this insight: We train an additional ensemble of agents that optimize reward. At test time, either the ensemble agrees on an action, and we generalize well, or we take exploratory actions, which are guaranteed to generalize and drive us to a novel part of the state space, where the ensemble may potentially agree again. We show that our approach is the state-of-the-art on several tasks in the ProcGen challenge that have so far eluded effective generalization. For example, we demonstrate a success rate of $82\%$ on the Maze task and $74\%$ on Heist with $200$ training levels.

Via

Access Paper or Ask Questions

ContraBAR: Contrastive Bayes-Adaptive Deep RL

Jun 04, 2023

Era Choshen, Aviv Tamar

Abstract:In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal policy -- the optimal policy when facing an unknown task that is sampled from some known task distribution. Previous approaches tackled this problem by inferring a belief over task parameters, using variational inference methods. Motivated by recent successes of contrastive learning approaches in RL, such as contrastive predictive coding (CPC), we investigate whether contrastive methods can be used for learning Bayes-optimal behavior. We begin by proving that representations learned by CPC are indeed sufficient for Bayes optimality. Based on this observation, we propose a simple meta RL algorithm that uses CPC in lieu of variational belief inference. Our method, ContraBAR, achieves comparable performance to state-of-the-art in domains with state-based observation and circumvents the computational toll of future observation reconstruction, enabling learning in domains with image-based observations. It can also be combined with image augmentations for domain randomization and used seamlessly in both online and offline meta RL settings.

* ICML 2023. Pytorch code available at https://github.com/ec2604/ContraBAR

Via

Access Paper or Ask Questions

Goal-Conditioned Supervised Learning with Sub-Goal Prediction

May 17, 2023

Tom Jurgenson, Aviv Tamar

Abstract:Recently, a simple yet effective algorithm -- goal-conditioned supervised-learning (GCSL) -- was proposed to tackle goal-conditioned reinforcement-learning. GCSL is based on the principle of hindsight learning: by observing states visited in previously executed trajectories and treating them as attained goals, GCSL learns the corresponding actions via supervised learning. However, GCSL only learns a goal-conditioned policy, discarding other information in the process. Our insight is that the same hindsight principle can be used to learn to predict goal-conditioned sub-goals from the same trajectory. Based on this idea, we propose Trajectory Iterative Learner (TraIL), an extension of GCSL that further exploits the information in a trajectory, and uses it for learning to predict both actions and sub-goals. We investigate the settings in which TraIL can make better use of the data, and discover that for several popular problem settings, replacing real goals in GCSL with predicted TraIL sub-goals allows the agent to reach a greater set of goal states using the exact same data as GCSL, thereby improving its overall performance.

Via

Access Paper or Ask Questions

A Deep Learning Perspective on Network Routing

Mar 05, 2023

Yarin Perry, Felipe Vieira Frujeri, Chaim Hoch, Srikanth Kandula, Ishai Menache, Michael Schapira, Aviv Tamar

Abstract:Routing is, arguably, the most fundamental task in computer networking, and the most extensively studied one. A key challenge for routing in real-world environments is the need to contend with uncertainty about future traffic demands. We present a new approach to routing under demand uncertainty: tackling this challenge as stochastic optimization, and employing deep learning to learn complex patterns in traffic demands. We show that our method provably converges to the global optimum in well-studied theoretical models of multicommodity flow. We exemplify the practical usefulness of our approach by zooming in on the real-world challenge of traffic engineering (TE) on wide-area networks (WANs). Our extensive empirical evaluation on real-world traffic and network topologies establishes that our approach's TE quality almost matches that of an (infeasible) omniscient oracle, outperforming previously proposed approaches, and also substantially lowers runtimes.

* To appear at NSDI 2023

Via

Access Paper or Ask Questions

Online Tool Selection with Learned Grasp Prediction Models

Feb 15, 2023

Khashayar Rohanimanesh, Jake Metzger, William Richards, Aviv Tamar

Figure 1 for Online Tool Selection with Learned Grasp Prediction Models

Figure 2 for Online Tool Selection with Learned Grasp Prediction Models

Figure 3 for Online Tool Selection with Learned Grasp Prediction Models

Figure 4 for Online Tool Selection with Learned Grasp Prediction Models

Abstract:Deep learning-based grasp prediction models have become an industry standard for robotic bin-picking systems. To maximize pick success, production environments are often equipped with several end-effector tools that can be swapped on-the-fly, based on the target object. Tool-change, however, takes time. Choosing the order of grasps to perform, and corresponding tool-change actions, can improve system throughput; this is the topic of our work. The main challenge in planning tool change is uncertainty - we typically cannot see objects in the bin that are currently occluded. Inspired by queuing and admission control problems, we model the problem as a Markov Decision Process (MDP), where the goal is to maximize expected throughput, and we pursue an approximate solution based on model predictive control, where at each time step we plan based only on the currently visible objects. Special to our method is the idea of void zones, which are geometrical boundaries in which an unknown object will be present, and therefore cannot be accounted for during planning. Our planning problem can be solved using integer linear programming (ILP). However, we find that an approximate solution based on sparse tree search yields near optimal performance at a fraction of the time. Another question that we explore is how to measure the performance of tool-change planning: we find that throughput alone can fail to capture delicate and smooth behavior, and propose a principled alternative. Finally, we demonstrate our algorithms on both synthetic and real world bin picking tasks.

* 14 pages (including the cover page), 5 Figures, Technical Report, OSARO Inc

Via

Access Paper or Ask Questions

Towards Deployable RL -- What's Broken with RL Research and a Potential Fix

Jan 03, 2023

Shie Mannor, Aviv Tamar

Abstract:Reinforcement learning (RL) has demonstrated great potential, but is currently full of overhyping and pipe dreams. We point to some difficulties with current research which we feel are endemic to the direction taken by the community. To us, the current direction is not likely to lead to "deployable" RL: RL that works in practice and can work in practical situations yet still is economically viable. We also propose a potential fix to some of the difficulties of the field.

Via

Access Paper or Ask Questions

Learning Control by Iterative Inversion

Nov 03, 2022

Gal Leibovich, Guy Jacob, Or Avner, Gal Novik, Aviv Tamar

Abstract:We formulate learning for control as an $\textit{inverse problem}$ -- inverting a dynamical system to give the actions which yield desired behavior. The key challenge in this formulation is a $\textit{distribution shift}$ -- the learning agent only observes the forward mapping (its actions' consequences) on trajectories that it can execute, yet must learn the inverse mapping for inputs-outputs that correspond to a different, desired behavior. We propose a general recipe for inverse problems with a distribution shift that we term $\textit{iterative inversion}$ -- learn the inverse mapping under the current input distribution (policy), then use it on the desired output samples to obtain new inputs, and repeat. As we show, iterative inversion can converge to the desired inverse mapping, but under rather strict conditions on the mapping itself. We next apply iterative inversion to learn control. Our input is a set of demonstrations of desired behavior, given as video embeddings of trajectories, and our method iteratively learns to imitate trajectories generated by the current policy, perturbed by random exploration noise. We find that constantly adding the demonstrated trajectory embeddings $\textit{as input}$ to the policy when generating trajectories to imitate, a-la iterative inversion, steers the learning towards the desired trajectory distribution. To the best of our knowledge, this is the first exploration of learning control from the viewpoint of inverse problems, and our main advantage is simplicity -- we do not require rewards, and only employ supervised learning, which easily scales to state-of-the-art trajectory embedding techniques and policy representations. With a VQ-VAE embedding, and a transformer-based policy, we demonstrate non-trivial continuous control on several tasks. We also report improved performance on imitating diverse behaviors compared to reward based methods.

* Videos available at https://sites.google.com/view/iter-inver

Via

Access Paper or Ask Questions