Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Takeshi Ishita

VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps

Dec 18, 2025

Mizuho Aoki, Kohei Honda, Yasuhiro Yoshimura, Takeshi Ishita, Ryo Yonetani

Abstract:This paper presents Vision-Language Global Localization (VLG-Loc), a novel global localization method that uses human-readable labeled footprint maps containing only names and areas of distinctive visual landmarks in an environment. While humans naturally localize themselves using such maps, translating this capability to robotic systems remains highly challenging due to the difficulty of establishing correspondences between observed landmarks and those in the map without geometric and appearance details. To address this challenge, VLG-Loc leverages a vision-language model (VLM) to search the robot's multi-directional image observations for the landmarks noted in the map. The method then identifies robot poses within a Monte Carlo localization framework, where the found landmarks are used to evaluate the likelihood of each pose hypothesis. Experimental validation in simulated and real-world retail environments demonstrates superior robustness compared to existing scan-based methods, particularly under environmental changes. Further improvements are achieved through the probabilistic fusion of visual and scan-based localization.

* v2: Updated the citation of SparseLoc from an arXiv preprint to its published version in the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Via

Access Paper or Ask Questions

GSplatVNM: Point-of-View Synthesis for Visual Navigation Models Using Gaussian Splatting

Mar 07, 2025

Kohei Honda, Takeshi Ishita, Yasuhiro Yoshimura, Ryo Yonitani

Figure 1 for GSplatVNM: Point-of-View Synthesis for Visual Navigation Models Using Gaussian Splatting

Figure 2 for GSplatVNM: Point-of-View Synthesis for Visual Navigation Models Using Gaussian Splatting

Figure 3 for GSplatVNM: Point-of-View Synthesis for Visual Navigation Models Using Gaussian Splatting

Figure 4 for GSplatVNM: Point-of-View Synthesis for Visual Navigation Models Using Gaussian Splatting

Abstract:This paper presents a novel approach to image-goal navigation by integrating 3D Gaussian Splatting (3DGS) with Visual Navigation Models (VNMs), a method we refer to as GSplatVNM. VNMs offer a promising paradigm for image-goal navigation by guiding a robot through a sequence of point-of-view images without requiring metrical localization or environment-specific training. However, constructing a dense and traversable sequence of target viewpoints from start to goal remains a central challenge, particularly when the available image database is sparse. To address these challenges, we propose a 3DGS-based viewpoint synthesis framework for VNMs that synthesizes intermediate viewpoints to seamlessly bridge gaps in sparse data while significantly reducing storage overhead. Experimental results in a photorealistic simulator demonstrate that our approach not only enhances navigation efficiency but also exhibits robustness under varying levels of image database sparsity.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions