Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ruofei Bai

Michael

Line-of-Sight-Constrained Multi-Robot Mapless Navigation via Polygonal Visible Regions

Mar 27, 2026

Ruofei Bai, Shenghai Yuan, Xinhang Xu, Xingyu Ji, Xiaowei Li, Hongliang Guo, Wei-Yun Yau, Lihua Xie

Abstract:Multi-robot systems rely on underlying connectivity to ensure reliable communication and timely coordination. This paper studies the line-of-sight (LoS) connectivity maintenance problem in multi-robot navigation with unknown obstacles. Prior works typically assume known environment maps to formulate LoS constraints between robots, which hinders their practical deployment. To overcome this limitation, we propose an inherently distributed approach where each robot only constructs an egocentric visible region based on its real-time LiDAR scans, instead of endeavoring to build a global map online. The individual visible regions are shared through distributed communication to establish inter-robot LoS constraints, which are then incorporated into a multi-robot navigation framework to ensure LoS-connectivity. Moreover, we enhance the robustness of connectivity maintenance by proposing a more accurate LoS-distance metric, which further enables flexible topology optimization that eliminates redundant and effort-demanding connections. The proposed framework is evaluated through extensive multi-robot navigation and exploration tasks in both simulation and real-world experiments. Results show that it reliably maintains LoS-connectivity between robots in challenging environments cluttered with obstacles, even under large visible ranges and fragile minimal topologies, where existing methods consistently fail. Ablation studies also reveal that topology optimization boosts navigation efficiency by around $20\%$, demonstrating the framework's potential for efficient navigation under connectivity constraints.

* 10 pages, 7 figures. See videos and code: https://github.com/bairuofei/LoS_constrained_navigation

Via

Access Paper or Ask Questions

ImagiNav: Scalable Embodied Navigation via Generative Visual Prediction and Inverse Dynamics

Mar 14, 2026

Jie Chen, Yuxin Cai, Yizhuo Wang, Ruofei Bai, Yuhong Cao, Jun Li, Yau Wei Yun, Guillaume Sartoretti

Abstract:Enabling robots to navigate open-world environments via natural language is critical for general-purpose autonomy. Yet, Vision-Language Navigation has relied on end-to-end policies trained on expensive, embodiment-specific robot data. While recent foundation models trained on vast simulation data show promise, the challenge of scaling and generalizing due to the limited scene diversity and visual fidelity in simulation persists. To address this gap, we propose ImagiNav, a novel modular paradigm that decouples visual planning from robot actuation, enabling the direct utilization of diverse in-the-wild navigation videos. Our framework operates as a hierarchy: a Vision-Language Model first decomposes instructions into textual subgoals; a finetuned generative video model then imagines the future video trajectory towards that subgoal; finally, an inverse dynamics model extracts the trajectory from the imagined video, which can then be tracked via a low-level controller. We additionally develop a scalable data pipeline of in-the-wild navigation videos auto-labeled via inverse dynamics and a pretrained Vision-Language Model. ImagiNav demonstrates strong zero-shot transfer to robot navigation without requiring robot demonstrations, paving the way for generalist robots that learn navigation directly from unlabeled, open-world data.

Via

Access Paper or Ask Questions

Beyond Imitation: Reinforcement Learning Fine-Tuning for Adaptive Diffusion Navigation Policies

Mar 13, 2026

Junhe Sheng, Ruofei Bai, Kuan Xu, Ruimeng Liu, Jie Chen, Shenghai Yuan, Wei-Yun Yau, Lihua Xie

Abstract:Diffusion-based robot navigation policies trained on large-scale imitation learning datasets, can generate multi-modal trajectories directly from the robot's visual observations, bypassing the traditional localization-mapping-planning pipeline and achieving strong zero-shot generalization. However, their performance remains constrained by the coverage of offline datasets, and when deployed in unseen settings, distribution shift often leads to accumulated trajectory errors and safety-critical failures. Adapting diffusion policies with reinforcement learning is challenging because their iterative denoising structure hinders effective gradient backpropagation, while also making the training of an additional value network computationally expensive and less stable. To address these issues, we propose a reinforcement learning fine-tuning framework tailored for diffusion-based navigation. The method leverages the inherent multi-trajectory sampling mechanism of diffusion models and adopts Group Relative Policy Optimization (GRPO), which estimates relative advantages across sampled trajectories without requiring a separate value network. To preserve pretrained representations while enabling adaptation, we freeze the visual encoder and selectively update the higher decoder layers and action head, enhancing safety-aware behaviors through online environmental feedback. On the PointGoal task in Isaac Sim, our approach improves the Success Rate from 52.0% to 58.7% and SPL from 0.49 to 0.54 on unseen scenes, while reducing collision frequency. Additional experiments show that the fine-tuned policy transfers zero-shot to a real quadruped platform and maintains stable performance in geometrically out-of-distribution environments, suggesting improved adaptability and safe generalization to new domains.

Via

Access Paper or Ask Questions

TIDAL: Temporally Interleaved Diffusion and Action Loop for High-Frequency VLA Control

Jan 21, 2026

Yuteng Sun, Haoran Wang, Ruofei Bai, Zhengguo Li, Jun Li, Meng Yee, Chuah, Wei Yun Yau

Abstract:Large-scale Vision-Language-Action (VLA) models offer semantic generalization but suffer from high inference latency, limiting them to low-frequency batch-and-execute paradigm. This frequency mismatch creates an execution blind spot, causing failures in dynamic environments where targets move during the open-loop execution window. We propose TIDAL (Temporally Interleaved Diffusion and Action Loop), a hierarchical framework that decouples semantic reasoning from high-frequency actuation. TIDAL operates as a backbone-agnostic module for diffusion-based VLAs, using a dual-frequency architecture to redistribute the computational budget. Specifically, a low-frequency macro-intent loop caches semantic embeddings, while a high-frequency micro-control loop interleaves single-step flow integration with execution. This design enables approximately 9 Hz control updates on edge hardware (vs. approximately 2.4 Hz baselines) without increasing marginal overhead. To handle the resulting latency shift, we introduce a temporally misaligned training strategy where the policy learns predictive compensation using stale semantic intent alongside real-time proprioception. Additionally, we address the insensitivity of static vision encoders to velocity by incorporating a differential motion predictor. TIDAL is architectural, making it orthogonal to system-level optimizations. Experiments show a 2x performance gain over open-loop baselines in dynamic interception tasks. Despite a marginal regression in static success rates, our approach yields a 4x increase in feedback frequency and extends the effective horizon of semantic embeddings beyond the native action chunk size. Under non-paused inference protocols, TIDAL remains robust where standard baselines fail due to latency.

Via

Access Paper or Ask Questions

FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation

Dec 31, 2025

Zichen Tang, Haihong E, Rongjin Li, Jiacheng Liu, Linwei Jia, Zhuodi Hao, Zhongjun Yang, Yuanze Li, Haolin Tian, Xinyi Hu(+11 more)

Abstract:We introduce FinMMDocR, a novel bilingual multimodal benchmark for evaluating multimodal large language models (MLLMs) on real-world financial numerical reasoning. Compared to existing benchmarks, our work delivers three major advancements. (1) Scenario Awareness: 57.9% of 1,200 expert-annotated problems incorporate 12 types of implicit financial scenarios (e.g., Portfolio Management), challenging models to perform expert-level reasoning based on assumptions; (2) Document Understanding: 837 Chinese/English documents spanning 9 types (e.g., Company Research) average 50.8 pages with rich visual elements, significantly surpassing existing benchmarks in both breadth and depth of financial documents; (3) Multi-Step Computation: Problems demand 11-step reasoning on average (5.3 extraction + 5.7 calculation steps), with 65.0% requiring cross-page evidence (2.4 pages average). The best-performing MLLM achieves only 58.0% accuracy, and different retrieval-augmented generation (RAG) methods show significant performance variations on this task. We expect FinMMDocR to drive improvements in MLLMs and reasoning-enhanced methods on complex multimodal reasoning tasks in real-world scenarios.

* Accepted by AAAI-26 Main Track

Via

Access Paper or Ask Questions

Following Is All You Need: Robot Crowd Navigation Using People As Planners

Apr 15, 2025

Yuwen Liao, Xinhang Xu, Ruofei Bai, Yizhuo Yang, Muqing Cao, Shenghai Yuan, Lihua Xie

Abstract:Navigating in crowded environments requires the robot to be equipped with high-level reasoning and planning techniques. Existing works focus on developing complex and heavyweight planners while ignoring the role of human intelligence. Since humans are highly capable agents who are also widely available in a crowd navigation setting, we propose an alternative scheme where the robot utilises people as planners to benefit from their effective planning decisions and social behaviours. Through a set of rule-based evaluations, we identify suitable human leaders who exhibit the potential to guide the robot towards its goal. Using a simple base planner, the robot follows the selected leader through shorthorizon subgoals that are designed to be straightforward to achieve. We demonstrate through both simulated and real-world experiments that our novel framework generates safe and efficient robot plans compared to existing planners, even without predictive or data-driven modules. Our method also brings human-like robot behaviours without explicitly defining traffic rules and social norms. Code will be available at https://github.com/centiLinda/PeopleAsPlanner.git.

Via

Access Paper or Ask Questions

AirSwarm: Enabling Cost-Effective Multi-UAV Research with COTS drones

Mar 10, 2025

Xiaowei Li, Kuan Xu, Fen Liu, Ruofei Bai, Shenghai Yuan, Lihua Xie

Abstract:Traditional unmanned aerial vehicle (UAV) swarm missions rely heavily on expensive custom-made drones with onboard perception or external positioning systems, limiting their widespread adoption in research and education. To address this issue, we propose AirSwarm. AirSwarm democratizes multi-drone coordination using low-cost commercially available drones such as Tello or Anafi, enabling affordable swarm aerial robotics research and education. Key innovations include a hierarchical control architecture for reliable multi-UAV coordination, an infrastructure-free visual SLAM system for precise localization without external motion capture, and a ROS-based software framework for simplified swarm development. Experiments demonstrate cm-level tracking accuracy, low-latency control, communication failure resistance, formation flight, and trajectory tracking. By reducing financial and technical barriers, AirSwarm makes multi-robot education and research more accessible. The complete instructions and open source code will be available at

Via

Access Paper or Ask Questions

Realm: Real-Time Line-of-Sight Maintenance in Multi-Robot Navigation with Unknown Obstacles

Feb 21, 2025

Ruofei Bai, Shenghai Yuan, Kun Li, Hongliang Guo, Wei-Yun Yau, Lihua Xie

Figure 1 for Realm: Real-Time Line-of-Sight Maintenance in Multi-Robot Navigation with Unknown Obstacles

Figure 2 for Realm: Real-Time Line-of-Sight Maintenance in Multi-Robot Navigation with Unknown Obstacles

Figure 3 for Realm: Real-Time Line-of-Sight Maintenance in Multi-Robot Navigation with Unknown Obstacles

Figure 4 for Realm: Real-Time Line-of-Sight Maintenance in Multi-Robot Navigation with Unknown Obstacles

Abstract:Multi-robot navigation in complex environments relies on inter-robot communication and mutual observations for coordination and situational awareness. This paper studies the multi-robot navigation problem in unknown environments with line-of-sight (LoS) connectivity constraints. While previous works are limited to known environment models to derive the LoS constraints, this paper eliminates such requirements by directly formulating the LoS constraints between robots from their real-time point cloud measurements, leveraging point cloud visibility analysis techniques. We propose a novel LoS-distance metric to quantify both the urgency and sensitivity of losing LoS between robots considering potential robot movements. Moreover, to address the imbalanced urgency of losing LoS between two robots, we design a fusion function to capture the overall urgency while generating gradients that facilitate robots' collaborative movement to maintain LoS. The LoS constraints are encoded into a potential function that preserves the positivity of the Fiedler eigenvalue of the robots' network graph to ensure connectivity. Finally, we establish a LoS-constrained exploration framework that integrates the proposed connectivity controller. We showcase its applications in multi-robot exploration in complex unknown environments, where robots can always maintain the LoS connectivity through distributed sensing and communication, while collaboratively mapping the unknown environment. The implementations are open-sourced at https://github.com/bairuofei/LoS_constrained_navigation.

* 8 pages, 9 figures, accepted by IEEE ICRA 2025

Via

Access Paper or Ask Questions

Swept Volume-Aware Trajectory Planning and MPC Tracking for Multi-Axle Swerve-Drive AMRs

Dec 22, 2024

Tianxin Hu, Shenghai Yuan, Ruofei Bai, Xinghang Xu, Yuwen Liao, Fen Liu, Lihua Xie

Figure 1 for Swept Volume-Aware Trajectory Planning and MPC Tracking for Multi-Axle Swerve-Drive AMRs

Figure 2 for Swept Volume-Aware Trajectory Planning and MPC Tracking for Multi-Axle Swerve-Drive AMRs

Figure 3 for Swept Volume-Aware Trajectory Planning and MPC Tracking for Multi-Axle Swerve-Drive AMRs

Figure 4 for Swept Volume-Aware Trajectory Planning and MPC Tracking for Multi-Axle Swerve-Drive AMRs

Abstract:Multi-axle autonomous mobile robots (AMRs) are set to revolutionize the future of robotics in logistics. As the backbone of next-generation solutions, these robots face a critical challenge: managing and minimizing the swept volume during turns while maintaining precise control. Traditional systems designed for standard vehicles often struggle with the complex dynamics of multi-axle configurations, leading to inefficiency and increased safety risk in confined spaces. Our innovative framework overcomes these limitations by combining swept volume minimization with Signed Distance Field (SDF) path planning and model predictive control (MPC) for independent wheel steering. This approach not only plans paths with an awareness of the swept volume but actively minimizes it in real-time, allowing each axle to follow a precise trajectory while significantly reducing the space the vehicle occupies. By predicting future states and adjusting the turning radius of each wheel, our method enhances both maneuverability and safety, even in the most constrained environments. Unlike previous works, our solution goes beyond basic path calculation and tracking, offering real-time path optimization with minimal swept volume and efficient individual axle control. To our knowledge, this is the first comprehensive approach to tackle these challenges, delivering life-saving improvements in control, efficiency, and safety for multi-axle AMRs. Furthermore, we will open-source our work to foster collaboration and enable others to advance safer, more efficient autonomous systems.

* Submitted to ICRA 2025

Via

Access Paper or Ask Questions

Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization

Jul 01, 2024

Ruofei Bai, Shenghai Yuan, Hongliang Guo, Pengyu Yin, Wei-Yun Yau, Lihua Xie

Figure 1 for Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization

Figure 2 for Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization

Figure 3 for Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization

Figure 4 for Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization

Abstract:This paper considers the collaborative graph exploration problem in GPS-denied environments, where a group of robots are required to cover a graph environment while maintaining reliable pose estimations in collaborative simultaneous localization and mapping (SLAM). Considering both objectives presents challenges for multi-robot pathfinding, as it involves the expensive covariance inference for SLAM uncertainty evaluation, especially considering various combinations of robots' paths. To reduce the computational complexity, we propose an efficient two-stage strategy where exploration paths are first generated for quick coverage, and then enhanced by adding informative and distance-efficient loop-closing actions, called loop edges, along the paths for reliable pose estimation. We formulate the latter problem as a non-monotone submodular maximization problem by relating SLAM uncertainty with pose graph topology, which (1) facilitates more efficient evaluation of SLAM uncertainty than covariance inference, and (2) allows the application of approximation algorithms in submodular optimization to provide optimality guarantees. We further introduce the ordering heuristics to improve objective values while preserving the optimality bound. Simulation experiments over randomly generated graph environments verify the efficiency of our methods in finding paths for quick coverage and enhanced pose graph reliability, and benchmark the performance of the approximation algorithms and the greedy-based algorithm in the loop edge selection problem. Our implementations will be open-source at https://github.com/bairuofei/CGE.

* 9 pages, 13 figures, accepted by IEEE/RSJ IROS(2024)

Via

Access Paper or Ask Questions