Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rafael S. Rezende

RANa: Retrieval-Augmented Navigation

Apr 04, 2025

Gianluca Monaci, Rafael S. Rezende, Romain Deffayet, Gabriela Csurka, Guillaume Bono, Hervé Déjean, Stéphane Clinchant, Christian Wolf

Abstract:Methods for navigation based on large-scale learning typically treat each episode as a new problem, where the agent is spawned with a clean memory in an unknown environment. While these generalization capabilities to an unknown environment are extremely important, we claim that, in a realistic setting, an agent should have the capacity of exploiting information collected during earlier robot operations. We address this by introducing a new retrieval-augmented agent, trained with RL, capable of querying a database collected from previous episodes in the same environment and learning how to integrate this additional context information. We introduce a unique agent architecture for the general navigation task, evaluated on ObjectNav, ImageNav and Instance-ImageNav. Our retrieval and context encoding methods are data-driven and heavily employ vision foundation models (FM) for both semantic and geometric understanding. We propose new benchmarks for these settings and we show that retrieval allows zero-shot transfer across tasks and environments while significantly improving performance.

Via

Access Paper or Ask Questions

Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

Feb 14, 2024

Yannis Kalantidis, Mert Bülent Sarıyıldız, Rafael S. Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka

Figure 1 for Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

Figure 2 for Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

Figure 3 for Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

Figure 4 for Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

Abstract:State-of-the-art visual localization approaches generally rely on a first image retrieval step whose role is crucial. Yet, retrieval often struggles when facing varying conditions, due to e.g. weather or time of day, with dramatic consequences on the visual localization accuracy. In this paper, we improve this retrieval step and tailor it to the final localization task. Among the several changes we advocate for, we propose to synthesize variants of the training set images, obtained from generative text-to-image models, in order to automatically expand the training set towards a number of nameable variations that particularly hurt visual localization. After expanding the training set, we propose a training approach that leverages the specificities and the underlying geometry of this mix of real and synthetic images. We experimentally show that those changes translate into large improvements for the most challenging visual localization datasets. Project page: https://europe.naverlabs.com/ret4loc

* Accepted at ICLR 2024. Project Page: https://europe.naverlabs.com/ret4loc

Via

Access Paper or Ask Questions

SCNet: Learning Semantic Correspondence

Aug 17, 2017

Kai Han, Rafael S. Rezende, Bumsub Ham, Kwan-Yee K. Wong, Minsu Cho, Cordelia Schmid, Jean Ponce

Figure 1 for SCNet: Learning Semantic Correspondence

Figure 2 for SCNet: Learning Semantic Correspondence

Figure 3 for SCNet: Learning Semantic Correspondence

Figure 4 for SCNet: Learning Semantic Correspondence

Abstract:This paper addresses the problem of establishing semantic correspondences between images depicting different instances of the same object or scene category. Previous approaches focus on either combining a spatial regularizer with hand-crafted features, or learning a correspondence model for appearance only. We propose instead a convolutional neural network architecture, called SCNet, for learning a geometrically plausible model for semantic correspondence. SCNet uses region proposals as matching primitives, and explicitly incorporates geometric consistency in its loss function. It is trained on image pairs obtained from the PASCAL VOC 2007 keypoint dataset, and a comparative evaluation on several standard benchmarks demonstrates that the proposed approach substantially outperforms both recent deep learning architectures and previous methods based on hand-crafted features.

* ICCV 2017

Via

Access Paper or Ask Questions