Alert button
Picture for Austin Wang

Austin Wang

Alert button

HomeRobot: Open-Vocabulary Mobile Manipulation

Jun 20, 2023
Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

Figure 1 for HomeRobot: Open-Vocabulary Mobile Manipulation
Figure 2 for HomeRobot: Open-Vocabulary Mobile Manipulation
Figure 3 for HomeRobot: Open-Vocabulary Mobile Manipulation
Figure 4 for HomeRobot: Open-Vocabulary Mobile Manipulation

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it involves tackling sub-problems from across robotics: perception, language understanding, navigation, and manipulation are all essential to OVMM. In addition, integration of the solutions to these sub-problems poses its own substantial challenges. To drive research in this area, we introduce the HomeRobot OVMM benchmark, where an agent navigates household environments to grasp novel objects and place them on target receptacles. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch to encourage replication of real-world experiments across labs. We implement both reinforcement learning and heuristic (model-based) baselines and show evidence of sim-to-real transfer. Our baselines achieve a 20% success rate in the real world; our experiments identify ways future research work improve performance. See videos on our website: https://ovmm.github.io/.

* 35 pages, 20 figures, 8 tables 
Viaarxiv icon

USA-Net: Unified Semantic and Affordance Representations for Robot Memory

Apr 25, 2023
Benjamin Bolte, Austin Wang, Jimmy Yang, Mustafa Mukadam, Mrinal Kalakrishnan, Chris Paxton

Figure 1 for USA-Net: Unified Semantic and Affordance Representations for Robot Memory
Figure 2 for USA-Net: Unified Semantic and Affordance Representations for Robot Memory
Figure 3 for USA-Net: Unified Semantic and Affordance Representations for Robot Memory
Figure 4 for USA-Net: Unified Semantic and Affordance Representations for Robot Memory

In order for robots to follow open-ended instructions like "go open the brown cabinet over the sink", they require an understanding of both the scene geometry and the semantics of their environment. Robotic systems often handle these through separate pipelines, sometimes using very different representation spaces, which can be suboptimal when the two objectives conflict. In this work, we present USA-Net, a simple method for constructing a world representation that encodes both the semantics and spatial affordances of a scene in a differentiable map. This allows us to build a gradient-based planner which can navigate to locations in the scene specified using open-ended vocabulary. We use this planner to consistently generate trajectories which are both shorter 5-10% shorter and 10-30% closer to our goal query in CLIP embedding space than paths from comparable grid-based planners which don't leverage gradient information. To our knowledge, this is the first end-to-end differentiable planner optimizes for both semantics and affordance in a single implicit map. Code and visuals are available at our website: https://usa.bolte.cc/

Viaarxiv icon

Navigating to Objects Specified by Images

Apr 03, 2023
Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot

Figure 1 for Navigating to Objects Specified by Images
Figure 2 for Navigating to Objects Specified by Images
Figure 3 for Navigating to Objects Specified by Images
Figure 4 for Navigating to Objects Specified by Images

Images are a convenient way to specify which particular object instance an embodied agent should navigate to. Solving this task requires semantic visual reasoning and exploration of unknown environments. We present a system that can perform this task in both simulation and the real world. Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation. We re-identify the goal instance in egocentric vision using feature-matching and localize the goal instance by projecting matched features to a map. Each sub-task is solved using off-the-shelf components requiring zero fine-tuning. On the HM3D InstanceImageNav benchmark, this system outperforms a baseline end-to-end RL policy 7x and a state-of-the-art ImageNav model 2.3x (56% vs 25% success). We deploy this system to a mobile robot platform and demonstrate effective real-world performance, achieving an 88% success rate across a home and an office environment.

Viaarxiv icon

Theseus: A Library for Differentiable Nonlinear Optimization

Jul 19, 2022
Luis Pineda, Taosha Fan, Maurizio Monge, Shobha Venkataraman, Paloma Sodhi, Ricky Chen, Joseph Ortiz, Daniel DeTone, Austin Wang, Stuart Anderson, Jing Dong, Brandon Amos, Mustafa Mukadam

Figure 1 for Theseus: A Library for Differentiable Nonlinear Optimization
Figure 2 for Theseus: A Library for Differentiable Nonlinear Optimization
Figure 3 for Theseus: A Library for Differentiable Nonlinear Optimization
Figure 4 for Theseus: A Library for Differentiable Nonlinear Optimization

We present Theseus, an efficient application-agnostic open source library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch, providing a common framework for end-to-end structured learning in robotics and vision. Existing DNLS implementations are application specific and do not always incorporate many ingredients important for efficiency. Theseus is application-agnostic, as we illustrate with several example applications that are built using the same underlying differentiable components, such as second-order optimizers, standard costs functions, and Lie groups. For efficiency, Theseus incorporates support for sparse solvers, automatic vectorization, batching, GPU acceleration, and gradient computation with implicit differentiation and direct loss minimization. We do extensive performance evaluation in a set of applications, demonstrating significant efficiency gains and better scalability when these features are incorporated. Project page: https://sites.google.com/view/theseus-ai

Viaarxiv icon

RB2: Robotic Manipulation Benchmarking with a Twist

Mar 15, 2022
Sudeep Dasari, Jianren Wang, Joyce Hong, Shikhar Bahl, Yixin Lin, Austin Wang, Abitha Thankaraj, Karanbir Chahal, Berk Calli, Saurabh Gupta, David Held, Lerrel Pinto, Deepak Pathak, Vikash Kumar, Abhinav Gupta

Figure 1 for RB2: Robotic Manipulation Benchmarking with a Twist
Figure 2 for RB2: Robotic Manipulation Benchmarking with a Twist
Figure 3 for RB2: Robotic Manipulation Benchmarking with a Twist
Figure 4 for RB2: Robotic Manipulation Benchmarking with a Twist

Benchmarks offer a scientific way to compare algorithms using objective performance metrics. Good benchmarks have two features: (a) they should be widely useful for many research groups; (b) and they should produce reproducible findings. In robotic manipulation research, there is a trade-off between reproducibility and broad accessibility. If the benchmark is kept restrictive (fixed hardware, objects), the numbers are reproducible but the setup becomes less general. On the other hand, a benchmark could be a loose set of protocols (e.g. object sets) but the underlying variation in setups make the results non-reproducible. In this paper, we re-imagine benchmarking for robotic manipulation as state-of-the-art algorithmic implementations, alongside the usual set of tasks and experimental protocols. The added baseline implementations will provide a way to easily recreate SOTA numbers in a new local robotic setup, thus providing credible relative rankings between existing approaches and new work. However, these local rankings could vary between different setups. To resolve this issue, we build a mechanism for pooling experimental data between labs, and thus we establish a single global ranking for existing (and proposed) SOTA algorithms. Our benchmark, called Ranking-Based Robotics Benchmark (RB2), is evaluated on tasks that are inspired from clinically validated Southampton Hand Assessment Procedures. Our benchmark was run across two different labs and reveals several surprising findings. For example, extremely simple baselines like open-loop behavior cloning, outperform more complicated models (e.g. closed loop, RNN, Offline-RL, etc.) that are preferred by the field. We hope our fellow researchers will use RB2 to improve their research's quality and rigor.

* accepted at the NeurIPS 2021 Datasets and Benchmarks Track 
Viaarxiv icon

Differentiable and Learnable Robot Models

Feb 22, 2022
Franziska Meier, Austin Wang, Giovanni Sutanto, Yixin Lin, Paarth Shah

Figure 1 for Differentiable and Learnable Robot Models

Building differentiable simulations of physical processes has recently received an increasing amount of attention. Specifically, some efforts develop differentiable robotic physics engines motivated by the computational benefits of merging rigid body simulations with modern differentiable machine learning libraries. Here, we present a library that focuses on the ability to combine data driven methods with analytical rigid body computations. More concretely, our library \emph{Differentiable Robot Models} implements both \emph{differentiable} and \emph{learnable} models of the kinematics and dynamics of robots in Pytorch. The source-code is available at \url{https://github.com/facebookresearch/differentiable-robot-model}

Viaarxiv icon

Learning State-Dependent Losses for Inverse Dynamics Learning

Mar 12, 2020
Kristen Morse, Neha Das, Yixin Lin, Austin Wang, Akshara Rai, Franziska Meier

Figure 1 for Learning State-Dependent Losses for Inverse Dynamics Learning
Figure 2 for Learning State-Dependent Losses for Inverse Dynamics Learning
Figure 3 for Learning State-Dependent Losses for Inverse Dynamics Learning
Figure 4 for Learning State-Dependent Losses for Inverse Dynamics Learning

Being able to quickly adapt to changes in dynamics is paramount in model-based control for object manipulation tasks. In order to influence fast adaptation of the inverse dynamics model's parameters, data efficiency is crucial. Given observed data, a key element to how an optimizer updates model parameters is the loss function. In this work, we propose to apply meta-learning to learn structured, state-dependent loss functions during a meta-training phase. We then replace standard losses with our learned losses during online adaptation tasks. We evaluate our proposed approach on inverse dynamics learning tasks, both in simulation and on real hardware data. In both settings, the structured learned losses improve online adaptation speed, when compared to standard, state-independent loss functions.

* 9 pages, 8 figures, submitted to IROS 2020, Kristen Morse and Neha Das had equal contribution 
Viaarxiv icon

Encoding Physical Constraints in Differentiable Newton-Euler Algorithm

Jan 24, 2020
Giovanni Sutanto, Austin Wang, Yixin Lin, Mustafa Mukadam, Gaurav Sukhatme, Akshara Rai, Franziska Meier

Figure 1 for Encoding Physical Constraints in Differentiable Newton-Euler Algorithm
Figure 2 for Encoding Physical Constraints in Differentiable Newton-Euler Algorithm

The recursive Newton-Euler Algorithm (RNEA) is a popular technique in robotics for computing the dynamics of robots. The computed dynamics can then be used for torque control with inverse dynamics, or for forward dynamics computations. RNEA can be framed as a differentiable computational graph, enabling the dynamics parameters of the robot to be learned from data via modern auto-differentiation toolboxes. However, the dynamics parameters learned in this manner can be physically implausible. In this work, we incorporate physical constraints in the learning by adding structure to the learned parameters. This results in a framework that can learn physically plausible dynamics via gradient descent, improving the training speed as well as generalization of the learned dynamics models. We evaluate our method on real-time inverse dynamics predictions of a 7 degree of freedom robot arm, both in simulation and on the real robot. Our experiments study a spectrum of structure added to learned dynamics, and compare their performance and generalization.

* 10 pages (i.e. 8 pages of technical content and 2 pages of the Bibliography/References), submitted and currently under review for publication at the 2nd Learning for Dynamics and Control (L4DC) Conference, year 2020 
Viaarxiv icon