Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Siddhartha Srinivasa

Lyceum: An efficient and scalable ecosystem for robot learning

Jan 21, 2020

Colin Summers, Kendall Lowrey, Aravind Rajeswaran, Siddhartha Srinivasa, Emanuel Todorov

Figure 1 for Lyceum: An efficient and scalable ecosystem for robot learning

Figure 2 for Lyceum: An efficient and scalable ecosystem for robot learning

Figure 3 for Lyceum: An efficient and scalable ecosystem for robot learning

Figure 4 for Lyceum: An efficient and scalable ecosystem for robot learning

Abstract:We introduce Lyceum, a high-performance computational ecosystem for robot learning. Lyceum is built on top of the Julia programming language and the MuJoCo physics simulator, combining the ease-of-use of a high-level programming language with the performance of native C. In addition, Lyceum has a straightforward API to support parallel computation across multiple cores and machines. Overall, depending on the complexity of the environment, Lyceum is 5-30x faster compared to other popular abstractions like OpenAI's Gym and DeepMind's dm-control. This substantially reduces training time for various reinforcement learning algorithms; and is also fast enough to support real-time model predictive control through MuJoCo. The code, tutorials, and demonstration videos can be found at: www.lyceum.ml.

Via

Access Paper or Ask Questions

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Dec 03, 2019

Samuel Ainsworth, Matt Barnes, Siddhartha Srinivasa

Figure 1 for Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Figure 2 for Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Figure 3 for Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Figure 4 for Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Abstract:In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.

* NeurIPS 2019

Via

Access Paper or Ask Questions

Imitation Learning as $f$-Divergence Minimization

May 30, 2019

Liyiming Ke, Matt Barnes, Wen Sun, Gilwoo Lee, Sanjiban Choudhury, Siddhartha Srinivasa

Figure 1 for Imitation Learning as $f$-Divergence Minimization

Figure 2 for Imitation Learning as $f$-Divergence Minimization

Figure 3 for Imitation Learning as $f$-Divergence Minimization

Figure 4 for Imitation Learning as $f$-Divergence Minimization

Abstract:We address the problem of imitation learning with multi-modal demonstrations. Instead of attempting to learn all modes, we argue that in many tasks it is sufficient to imitate any one of them. We show that the state-of-the-art methods such as GAIL and behavior cloning, due to their choice of loss function, often incorrectly interpolate between such modes. Our key insight is to minimize the right divergence between the learner and the expert state-action distributions, namely the reverse KL divergence or I-projection. We propose a general imitation learning framework for estimating and minimizing any f-Divergence. By plugging in different divergences, we are able to recover existing algorithms such as Behavior Cloning (Kullback-Leibler), GAIL (Jensen Shannon) and Dagger (Total Variation). Empirical results show that our approximate I-projection technique is able to imitate multi-modal behaviors more reliably than GAIL and behavior cloning.

Via

Access Paper or Ask Questions

Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles

Apr 08, 2019

Aditya Mandalika, Sanjiban Choudhury, Oren Salzman, Siddhartha Srinivasa

Figure 1 for Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles

Figure 2 for Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles

Figure 3 for Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles

Figure 4 for Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles

Abstract:Lazy search algorithms can efficiently solve problems where edge evaluation is the bottleneck in computation, as is the case for robotic motion planning. The optimal algorithm in this class, LazySP, lazily restricts edge evaluation to only the shortest path. Doing so comes at the expense of search effort, i.e., LazySP must recompute the search tree every time an edge is found to be invalid. This becomes prohibitively expensive when dealing with large graphs or highly cluttered environments. Our key insight is the need to balance both edge evaluation and search effort to minimize the total planning time. Our contribution is two-fold. First, we propose a framework, Generalized Lazy Search (GLS), that seamlessly toggles between search and evaluation to prevent wasted efforts. We show that for a choice of toggle, GLS is provably more efficient than LazySP. Second, we leverage prior experience of edge probabilities to derive GLS policies that minimize expected planning time. We show that GLS equipped with such priors significantly outperforms competitive baselines for many simulated environments in R2, SE(2) and 7-DoF manipulation.

* Submitted to International Conference on Automated Planning and Scheduling (ICAPS) 2019

Via

Access Paper or Ask Questions

Improving Robot Success Detection using Static Object Data

Apr 02, 2019

Rosario Scalise, Jesse Thomason, Yonatan Bisk, Siddhartha Srinivasa

Figure 1 for Improving Robot Success Detection using Static Object Data

Figure 2 for Improving Robot Success Detection using Static Object Data

Figure 3 for Improving Robot Success Detection using Static Object Data

Figure 4 for Improving Robot Success Detection using Static Object Data

Abstract:We use static object data to improve success detection for stacking objects on and nesting objects in one another. Such actions are necessary for certain robotics tasks, e.g., clearing a dining table or packing a warehouse bin. However, using an RGB-D camera to detect success can be insufficient: same-colored objects can be difficult to differentiate, and reflective silverware cause noisy depth camera perception. We show that adding static data about the objects themselves improves the performance of an end-to-end pipeline for classifying action outcomes. Images of the objects, and language expressions describing them, encode prior geometry, shape, and size information that refine classification accuracy. We collect over 13 hours of egocentric manipulation data for training a model to reason about whether a robot successfully placed unseen objects in or on one another. The model achieves up to a 57% absolute gain over the task baseline on pairs of previously unseen objects.

Via

Access Paper or Ask Questions

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Apr 02, 2019

Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa

Figure 1 for Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Figure 2 for Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Figure 3 for Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Figure 4 for Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Abstract:We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et. al. (2018). Given a natural language instruction and photo-realistic image views of a previously unseen environment, the agent was tasked with navigating from source to target location as quickly as possible. While all current approaches make local action decisions or score entire trajectories using beam search, ours balances local and global signals when exploring an unobserved environment. Importantly, this lets us act greedily but use global signals to backtrack when necessary. Applying FAST framework to existing state-of-the-art models achieved a 17% relative gain, an absolute 6% gain on Success rate weighted by Path Length (SPL).

* CVPR 2019 Oral, video demo: https://youtu.be/AD9TNohXoPA

Via

Access Paper or Ask Questions

Learning Configuration Space Belief Model from Collision Checks for Motion Planning

Feb 10, 2019

Sumit Kumar, Shushman Choudhary, Siddhartha Srinivasa

Figure 1 for Learning Configuration Space Belief Model from Collision Checks for Motion Planning

Figure 2 for Learning Configuration Space Belief Model from Collision Checks for Motion Planning

Figure 3 for Learning Configuration Space Belief Model from Collision Checks for Motion Planning

Figure 4 for Learning Configuration Space Belief Model from Collision Checks for Motion Planning

Abstract:For motion planning in high dimensional configuration spaces, a significant computational bottleneck is collision detection. Our aim is to reduce the expected number of collision checks by creating a belief model of the configuration space using results from collision tests. We assume the robot's configuration space to be a continuous ambient space whereby neighbouring points tend to share the same collision state. This enables us to formulate a probabilistic model that assigns to unevaluated configurations a belief estimate of being collision-free. We have presented a detailed comparative analysis of various kNN methods and distance metrics used to evaluate C-space belief. We have also proposed a weighting matrix in C-space to improve the performance of kNN methods. Moreover, we have proposed a topological method that exploits the higher order structure of the C-space to generate a belief model. Our results indicate that our proposed topological method outperforms kNN methods by achieving higher model accuracy while being computationally efficient.

Via

Access Paper or Ask Questions

The Assistive Multi-Armed Bandit

Jan 24, 2019

Lawrence Chan, Dylan Hadfield-Menell, Siddhartha Srinivasa, Anca Dragan

Figure 1 for The Assistive Multi-Armed Bandit

Figure 2 for The Assistive Multi-Armed Bandit

Figure 3 for The Assistive Multi-Armed Bandit

Figure 4 for The Assistive Multi-Armed Bandit

Abstract:Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people are themselves learning about what they want. In this work, we introduce the assistive multi-armed bandit, where a robot assists a human playing a bandit task to maximize cumulative reward. In this problem, the human does not know the reward function but can learn it through the rewards received from arm pulls; the robot only observes which arms the human pulls but not the reward associated with each pull. We offer sufficient and necessary conditions for successfully assisting the human in this framework. Surprisingly, better human performance in isolation does not necessarily lead to better performance when assisted by the robot: a human policy can do better by effectively communicating its observed rewards to the robot. We conduct proof-of-concept experiments that support these results. We see this work as contributing towards a theory behind algorithms for human-robot interaction.

* Accepted to HRI 2019

Via

Access Paper or Ask Questions

Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning

Nov 22, 2018

Min Chen, Stefanos Nikolaidis, Harold Soh, David Hsu, Siddhartha Srinivasa

Figure 1 for Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning

Figure 2 for Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning

Figure 3 for Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning

Figure 4 for Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning

Abstract:Trust in autonomy is essential for effective human-robot collaboration and user adoption of autonomous systems such as robot assistants. This paper introduces a computational model which integrates trust into robot decision-making. Specifically, we learn from data a partially observable Markov decision process (POMDP) with human trust as a latent variable. The trust-POMDP model provides a principled approach for the robot to (i) infer the trust of a human teammate through interaction, (ii) reason about the effect of its own actions on human trust, and (iii) choose actions that maximize team performance over the long term. We validated the model through human subject experiments on a table-clearing task in simulation (201 participants) and with a real robot (20 participants). In our studies, the robot builds human trust by manipulating low-risk objects first. Interestingly, the robot sometimes fails intentionally in order to modulate human trust and achieve the best team performance. These results show that the trust-POMDP calibrates trust to improve human-robot team performance over the long term. Further, they highlight that maximizing trust alone does not always lead to the best performance.

* Chen and Nikolaidis contributed equally to the work. Appeared In Proceedings of 2018 ACM/IEEE International Conference on Human-Robot Interaction, Chicago, IL, USA, (HRI 2018)

Via

Access Paper or Ask Questions

Sample-Efficient Learning of Nonprehensile Manipulation Policies via Physics-Based Informed State Distributions

Oct 24, 2018

Lerrel Pinto, Aditya Mandalika, Brian Hou, Siddhartha Srinivasa

Figure 1 for Sample-Efficient Learning of Nonprehensile Manipulation Policies via Physics-Based Informed State Distributions

Figure 2 for Sample-Efficient Learning of Nonprehensile Manipulation Policies via Physics-Based Informed State Distributions

Figure 3 for Sample-Efficient Learning of Nonprehensile Manipulation Policies via Physics-Based Informed State Distributions

Figure 4 for Sample-Efficient Learning of Nonprehensile Manipulation Policies via Physics-Based Informed State Distributions

Abstract:This paper proposes a sample-efficient yet simple approach to learning closed-loop policies for nonprehensile manipulation. Although reinforcement learning (RL) can learn closed-loop policies without requiring access to underlying physics models, it suffers from poor sample complexity on challenging tasks. To overcome this problem, we leverage rearrangement planning to provide an informative physics-based prior on the environment's optimal state-visitation distribution. Specifically, we present a new technique, Learning with Planned Episodic Resets (LeaPER), that resets the environment's state to one informed by the prior during the learning phase. We experimentally show that LeaPER significantly outperforms traditional RL approaches by a factor of up to 5X on simulated rearrangement. Further, we relax dynamics from quasi-static to welded contacts to illustrate that LeaPER is robust to the use of simpler physics models. Finally, LeaPER's closed-loop policies significantly improve task success rates relative to both open-loop controls with a planned path or simple feedback controllers that track open-loop trajectories. We demonstrate the performance and behavior of LeaPER on a physical 7-DOF manipulator in https://youtu.be/feS-zFq6J1c.

Via

Access Paper or Ask Questions