Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:OpenNav: Open-World Navigation with Multimodal Large Language Models

Jul 24, 2025

Mingfeng Yuan, Letian Wang, Steven L. Waslander

Figure 1 for OpenNav: Open-World Navigation with Multimodal Large Language Models

Figure 2 for OpenNav: Open-World Navigation with Multimodal Large Language Models

Figure 3 for OpenNav: Open-World Navigation with Multimodal Large Language Models

Figure 4 for OpenNav: Open-World Navigation with Multimodal Large Language Models

Share this with someone who'll enjoy it:

Abstract:Pre-trained large language models (LLMs) have demonstrated strong common-sense reasoning abilities, making them promising for robotic navigation and planning tasks. However, despite recent progress, bridging the gap between language descriptions and actual robot actions in the open-world, beyond merely invoking limited predefined motion primitives, remains an open challenge. In this work, we aim to enable robots to interpret and decompose complex language instructions, ultimately synthesizing a sequence of trajectory points to complete diverse navigation tasks given open-set instructions and open-set objects. We observe that multi-modal large language models (MLLMs) exhibit strong cross-modal understanding when processing free-form language instructions, demonstrating robust scene comprehension. More importantly, leveraging their code-generation capability, MLLMs can interact with vision-language perception models to generate compositional 2D bird-eye-view value maps, effectively integrating semantic knowledge from MLLMs with spatial information from maps to reinforce the robot's spatial understanding. To further validate our approach, we effectively leverage large-scale autonomous vehicle datasets (AVDs) to validate our proposed zero-shot vision-language navigation framework in outdoor navigation tasks, demonstrating its capability to execute a diverse range of free-form natural language navigation instructions while maintaining robustness against object detection errors and linguistic ambiguities. Furthermore, we validate our system on a Husky robot in both indoor and outdoor scenes, demonstrating its real-world robustness and applicability. Supplementary videos are available at https://trailab.github.io/OpenNav-website/

View paper on

Share this with someone who'll enjoy it:

Title:OpenNav: Open-World Navigation with Multimodal Large Language Models

Paper and Code