Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haodong Han

SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing

Apr 15, 2026

Aodi Wu, Haodong Han, Xubo Luo, Ruisuo Wang, Shan He, Xue Wan

Abstract:Autonomous on-orbit servicing demands embodied agents that perceive through visual sensors, reason about 3D spatial situations, and execute multi-phase tasks over extended horizons. We present SpaceMind, a modular and self-evolving vision-language model (VLM) agent framework that decomposes knowledge, tools, and reasoning into three independently extensible dimensions: skill modules with dynamic routing, Model Context Protocol (MCP) tools with configurable profiles, and injectable reasoning-mode skills. An MCP-Redis interface layer enables the same codebase to operate across simulation and physical hardware without modification, and a Skill Self-Evolution mechanism distills operational experience into persistent skill files without model fine-tuning. We validate SpaceMind through 192 closed-loop runs across five satellites, three task types, and two environments, a UE5 simulation and a physical laboratory, deliberately including degraded conditions to stress-test robustness. Under nominal conditions all modes achieve 90--100% navigation success; under degradation, the Prospective mode uniquely succeeds in search-and-approach tasks where other modes fail. A self-evolution study shows that the agent recovers from failure in four of six groups from a single failed episode, including complete failure to 100% success and inspection scores improving from 12 to 59 out of 100. Real-world validation confirms zero-code-modification transfer to a physical robot with 100% rendezvous success. Code: https://github.com/wuaodi/SpaceMind

* 23 pages, 6 figures, 7 tables. Code available at https://github.com/wuaodi/SpaceMind

Via

Access Paper or Ask Questions

StepNav: Structured Trajectory Priors for Efficient and Multimodal Visual Navigation

Feb 01, 2026

Xubo Luo, Aodi Wu, Haodong Han, Xue Wan, Wei Zhang, Leizheng Shu, Ruisuo Wang

Abstract:Visual navigation is fundamental to autonomous systems, yet generating reliable trajectories in cluttered and uncertain environments remains a core challenge. Recent generative models promise end-to-end synthesis, but their reliance on unstructured noise priors often yields unsafe, inefficient, or unimodal plans that cannot meet real-time requirements. We propose StepNav, a novel framework that bridges this gap by introducing structured, multimodal trajectory priors derived from variational principles. StepNav first learns a geometry-aware success probability field to identify all feasible navigation corridors. These corridors are then used to construct an explicit, multi-modal mixture prior that initializes a conditional flow-matching process. This refinement is formulated as an optimal control problem with explicit smoothness and safety regularization. By replacing unstructured noise with physically-grounded candidates, StepNav generates safer and more efficient plans in significantly fewer steps. Experiments in both simulation and real-world benchmarks demonstrate consistent improvements in robustness, efficiency, and safety over state-of-the-art generative planners, advancing reliable trajectory generation for practical autonomous navigation. The code has been released at https://github.com/LuoXubo/StepNav.

* 8 pages, 7 figures; Accepted by ICRA 2026

Via

Access Paper or Ask Questions

LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration

Jul 09, 2024

Jiayi Liu, Qianyu Zhang, Xue Wan, Shengyang Zhang, Yaolin Tian, Haodong Han, Yutao Zhao, Baichuan Liu, Zeyuan Zhao, Xubo Luo

Figure 1 for LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration

Figure 2 for LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration

Figure 3 for LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration

Figure 4 for LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration

Abstract:With the complexity of lunar exploration missions, the moon needs to have a higher level of autonomy. Environmental perception and navigation algorithms are the foundation for lunar rovers to achieve autonomous exploration. The development and verification of algorithms require highly reliable data support. Most of the existing lunar datasets are targeted at a single task, lacking diverse scenes and high-precision ground truth labels. To address this issue, we propose a multi-task, multi-scene, and multi-label lunar benchmark dataset LuSNAR. This dataset can be used for comprehensive evaluation of autonomous perception and navigation systems, including high-resolution stereo image pairs, panoramic semantic labels, dense depth maps, LiDAR point clouds, and the position of rover. In order to provide richer scene data, we built 9 lunar simulation scenes based on Unreal Engine. Each scene is divided according to topographic relief and the density of objects. To verify the usability of the dataset, we evaluated and analyzed the algorithms of semantic segmentation, 3D reconstruction, and autonomous navigation. The experiment results prove that the dataset proposed in this paper can be used for ground verification of tasks such as autonomous environment perception and navigation, and provides a lunar benchmark dataset for testing the accessibility of algorithm metrics. We make LuSNAR publicly available at: https://github.com/autumn999999/LuSNAR-dataset.

* 22 pages, 11 figures, 9 tables

Via

Access Paper or Ask Questions