Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Devendra Singh Chaplot

AutoNeRF: Training Implicit Scene Representations with Autonomous Agents

Apr 21, 2023

Pierre Marza, Laetitia Matignon, Olivier Simonin, Dhruv Batra, Christian Wolf, Devendra Singh Chaplot

Abstract:Implicit representations such as Neural Radiance Fields (NeRF) have been shown to be very effective at novel view synthesis. However, these models typically require manual and careful human data collection for training. In this paper, we present AutoNeRF, a method to collect data required to train NeRFs using autonomous embodied agents. Our method allows an agent to explore an unseen environment efficiently and use the experience to build an implicit map representation autonomously. We compare the impact of different exploration strategies including handcrafted frontier-based exploration and modular approaches composed of trained high-level planners and classical low-level path followers. We train these models with different reward functions tailored to this problem and evaluate the quality of the learned representations on four different downstream tasks: classical viewpoint rendering, map reconstruction, planning, and pose refinement. Empirical results show that NeRFs can be trained on actively collected data using just a single episode of experience in an unseen environment, and can be used for several downstream robotic tasks, and that modular trained exploration models significantly outperform the classical baselines.

Via

Access Paper or Ask Questions

Navigating to Objects Specified by Images

Apr 03, 2023

Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot

Figure 1 for Navigating to Objects Specified by Images

Figure 2 for Navigating to Objects Specified by Images

Figure 3 for Navigating to Objects Specified by Images

Figure 4 for Navigating to Objects Specified by Images

Abstract:Images are a convenient way to specify which particular object instance an embodied agent should navigate to. Solving this task requires semantic visual reasoning and exploration of unknown environments. We present a system that can perform this task in both simulation and the real world. Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation. We re-identify the goal instance in egocentric vision using feature-matching and localize the goal instance by projecting matched features to a map. Each sub-task is solved using off-the-shelf components requiring zero fine-tuning. On the HM3D InstanceImageNav benchmark, this system outperforms a baseline end-to-end RL policy 7x and a state-of-the-art ImageNav model 2.3x (56% vs 25% success). We deploy this system to a mobile robot platform and demonstrate effective real-world performance, achieving an 88% success rate across a home and an office environment.

Via

Access Paper or Ask Questions

Navigating to Objects in the Real World

Dec 02, 2022

Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik, Devendra Singh Chaplot

Abstract:Semantic navigation is necessary to deploy mobile robots in uncontrolled environments like our homes, schools, and hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial navigation, which builds a geometric map using depth sensors and plans to reach point goals. Broadly, end-to-end learning approaches reactively map sensor inputs to actions with deep neural networks, while modular learning approaches enrich the classical pipeline with learning-based semantic sensing and exploration. But learned visual navigation policies have predominantly been evaluated in simulation. How well do different classes of methods work on a robot? We present a large-scale empirical study of semantic visual navigation methods comparing representative methods from classical, modular, and end-to-end learning approaches across six homes with no prior experience, maps, or instrumentation. We find that modular learning works well in the real world, attaining a 90% success rate. In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality. For practitioners, we show that modular learning is a reliable approach to navigate to objects: modularity and abstraction in policy design enable Sim-to-Real transfer. For researchers, we identify two key issues that prevent today's simulators from being reliable evaluation benchmarks - (A) a large Sim-to-Real gap in images and (B) a disconnect between simulation and real-world error modes - and propose concrete steps forward.

* 39 pages, 19 figures and tables, submitted to Science Robotics

Via

Access Paper or Ask Questions

Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

Nov 29, 2022

Jacob Krantz, Stefan Lee, Jitendra Malik, Dhruv Batra, Devendra Singh Chaplot

Figure 1 for Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

Figure 2 for Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

Figure 3 for Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

Figure 4 for Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

Abstract:We consider the problem of embodied visual navigation given an image-goal (ImageNav) where an agent is initialized in an unfamiliar environment and tasked with navigating to a location 'described' by an image. Unlike related navigation tasks, ImageNav does not have a standardized task definition which makes comparison across methods difficult. Further, existing formulations have two problematic properties; (1) image-goals are sampled from random locations which can lead to ambiguity (e.g., looking at walls), and (2) image-goals match the camera specification and embodiment of the agent; this rigidity is limiting when considering user-driven downstream applications. We present the Instance-specific ImageNav task (InstanceImageNav) to address these limitations. Specifically, the goal image is 'focused' on some particular object instance in the scene and is taken with camera parameters independent of the agent. We instantiate InstanceImageNav in the Habitat Simulator using scenes from the Habitat-Matterport3D dataset (HM3D) and release a standardized benchmark to measure community progress.

Via

Access Paper or Ask Questions

Retrospectives on the Embodied AI Workshop

Oct 17, 2022

Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi(+29 more)

Figure 1 for Retrospectives on the Embodied AI Workshop

Figure 2 for Retrospectives on the Embodied AI Workshop

Figure 3 for Retrospectives on the Embodied AI Workshop

Figure 4 for Retrospectives on the Embodied AI Workshop

Abstract:We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.

Via

Access Paper or Ask Questions

Habitat-Matterport 3D Semantics Dataset

Oct 11, 2022

Karmesh Yadav, Ram Ramrakhya, Santhosh Kumar Ramakrishnan, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel Xuan Chang, Dhruv Batra, Manolis Savva(+2 more)

Figure 1 for Habitat-Matterport 3D Semantics Dataset

Figure 2 for Habitat-Matterport 3D Semantics Dataset

Figure 3 for Habitat-Matterport 3D Semantics Dataset

Figure 4 for Habitat-Matterport 3D Semantics Dataset

Abstract:We present the Habitat-Matterport 3D Semantics (HM3DSEM) dataset. HM3DSEM is the largest dataset of 3D real-world spaces with densely annotated semantics that is currently available to the academic community. It consists of 142,646 object instance annotations across 216 3D spaces and 3,100 rooms within those spaces. The scale, quality, and diversity of object annotations far exceed those of datasets from prior work. A key difference setting apart HM3DSEM from other datasets is the use of texture information to annotate pixel-accurate object boundaries. We demonstrate the effectiveness of HM3DSEM dataset for the Object Goal Navigation task using different methods. Policies trained using HM3DSEM perform comparable or better than those trained on prior datasets.

* 14 Pages, 11 Figures, 4 Tables

Via

Access Paper or Ask Questions

Multi-skill Mobile Manipulation for Object Rearrangement

Sep 06, 2022

Jiayuan Gu, Devendra Singh Chaplot, Hao Su, Jitendra Malik

Figure 1 for Multi-skill Mobile Manipulation for Object Rearrangement

Figure 2 for Multi-skill Mobile Manipulation for Object Rearrangement

Figure 3 for Multi-skill Mobile Manipulation for Object Rearrangement

Figure 4 for Multi-skill Mobile Manipulation for Object Rearrangement

Abstract:We study a modular approach to tackle long-horizon mobile manipulation tasks for object rearrangement, which decomposes a full task into a sequence of subtasks. To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks. Although more effective than monolithic end-to-end RL policies, this framework suffers from compounding errors in skill chaining, e.g., navigating to a bad location where a stationary manipulation skill can not reach its target to manipulate. To this end, we propose that the manipulation skills should include mobility to have flexibility in interacting with the target object from multiple locations and at the same time the navigation skill could have multiple end points which lead to successful manipulation. We operationalize these ideas by implementing mobile manipulation skills rather than stationary ones and training a navigation skill trained with region goal instead of point goal. We evaluate our multi-skill mobile manipulation method M3 on 3 challenging long-horizon mobile manipulation tasks in the Home Assistant Benchmark (HAB), and show superior performance as compared to the baselines.

* Project website: https://sites.google.com/view/hab-m3

Via

Access Paper or Ask Questions

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

Jan 25, 2022

Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman

Figure 1 for PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

Figure 2 for PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

Figure 3 for PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

Figure 4 for PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

Abstract:State-of-the-art approaches to ObjectGoal navigation rely on reinforcement learning and typically require significant computational resources and time for learning. We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?' for an object and `how to navigate to (x, y)?'. Our key insight is that `where to look?' can be treated purely as a perception problem, and learned without environment interactions. To address this, we propose a network that predicts two complementary potential functions conditioned on a semantic map and uses them to decide where to look for an unseen object. We train the potential function network using supervised learning on a passive dataset of top-down semantic maps, and integrate it into a modular framework to perform ObjectGoal navigation. Experiments on Gibson and Matterport3D demonstrate that our method achieves the state-of-the-art for ObjectGoal navigation while incurring up to 1,600x less computational cost for training.

* 8 pages + appendix

Via

Access Paper or Ask Questions

Recognizing Scenes from Novel Viewpoints

Dec 02, 2021

Shengyi Qian, Alexander Kirillov, Nikhila Ravi, Devendra Singh Chaplot, Justin Johnson, David F. Fouhey, Georgia Gkioxari

Figure 1 for Recognizing Scenes from Novel Viewpoints

Figure 2 for Recognizing Scenes from Novel Viewpoints

Figure 3 for Recognizing Scenes from Novel Viewpoints

Figure 4 for Recognizing Scenes from Novel Viewpoints

Abstract:Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects. In this work, we attempt to endow machines with this ability. We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories. All this without access to the RGB images from those views. We pair 2D scene recognition with an implicit 3D representation and learn from multi-view 2D annotations of hundreds of scenes without any 3D supervision beyond camera poses. We experiment on challenging datasets and demonstrate our model's ability to jointly capture semantics and geometry of novel scenes with diverse layouts, object types and shapes.

Via

Access Paper or Ask Questions

Differentiable Spatial Planning using Transformers

Dec 02, 2021

Devendra Singh Chaplot, Deepak Pathak, Jitendra Malik

Figure 1 for Differentiable Spatial Planning using Transformers

Figure 2 for Differentiable Spatial Planning using Transformers

Figure 3 for Differentiable Spatial Planning using Transformers

Figure 4 for Differentiable Spatial Planning using Transformers

Abstract:We consider the problem of spatial path planning. In contrast to the classical solutions which optimize a new plan from scratch and assume access to the full map with ground truth obstacle locations, we learn a planner from the data in a differentiable manner that allows us to leverage statistical regularities from past data. We propose Spatial Planning Transformers (SPT), which given an obstacle map learns to generate actions by planning over long-range spatial dependencies, unlike prior data-driven planners that propagate information locally via convolutional structure in an iterative manner. In the setting where the ground truth map is not known to the agent, we leverage pre-trained SPTs in an end-to-end framework that has the structure of mapper and planner built into it which allows seamless generalization to out-of-distribution maps and goals. SPTs outperform prior state-of-the-art differentiable planners across all the setups for both manipulation and navigation tasks, leading to an absolute improvement of 7-19%.

* Published at ICML 2021. See project webpage at https://devendrachaplot.github.io/projects/spatial-planning-transformers

Via

Access Paper or Ask Questions