Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuesu Xiao

George Mason University, Fairfax, VA, USA

Designing Multi-Robot Ground Video Sensemaking with Public Safety Professionals

Feb 09, 2026

Puqi Zhou, Ali Asgarov, Aafiya Hussain, Wonjoon Park, Amit Paudyal, Sameep Shrestha, Chia-wei Tang, Michael F. Lighthiser, Michael R. Hieb, Xuesu Xiao(+2 more)

Abstract:Videos from fleets of ground robots can advance public safety by providing scalable situational awareness and reducing professionals' burden. Yet little is known about how to design and integrate multi-robot videos into public safety workflows. Collaborating with six police agencies, we examined how such videos could be made practical. In Study 1, we presented the first testbed for multi-robot ground video sensemaking. The testbed includes 38 events-of-interest (EoI) relevant to public safety, a dataset of 20 robot patrol videos (10 day/night pairs) covering EoI types, and 6 design requirements aimed at improving current video sensemaking practices. In Study 2, we built MRVS, a tool that augments multi-robot patrol video streams with a prompt-engineered video understanding model. Participants reported reduced manual workload and greater confidence with LLM-based explanations, while noting concerns about false alarms and privacy. We conclude with implications for designing future multi-robot video sensemaking tools. The testbed is available at https://github.com/Puqi7/MRVS\_VideoSensemaking

Via

Access Paper or Ask Questions

Applying Ground Robot Fleets in Urban Search: Understanding Professionals' Operational Challenges and Design Opportunities

Feb 04, 2026

Puqi Zhou, Charles R. Twardy, Cynthia Lum, Myeong Lee, David J. Porfirio, Michael R. Hieb, Chris Thomas, Xuesu Xiao, Sungsoo Ray Hong

Abstract:Urban searches demand rapid, defensible decisions and sustained physical effort under high cognitive and situational load. Incident commanders must plan, coordinate, and document time-critical operations, while field searchers execute evolving tasks in uncertain environments. With recent advances in technology, ground-robot fleets paired with computer-vision-based situational awareness and LLM-powered interfaces offer the potential to ease these operational burdens. However, no dedicated studies have examined how public safety professionals perceive such technologies or envision their integration into existing practices, risking building technically sophisticated yet impractical solutions. To address this gap, we conducted focus-group sessions with eight police officers across five local departments in Virginia. Our findings show that ground robots could reduce professionals' reliance on paper references, mental calculations, and ad-hoc coordination, alleviating cognitive and physical strain in four key challenge areas: (1) partitioning the workforce across multiple search hypotheses, (2) retaining group awareness and situational awareness, (3) building route planning that fits the lost-person profile, and (4) managing cognitive and physical fatigue under uncertainty. We further identify four design opportunities and requirements for future ground-robot fleet integration in public-safety operations: (1) scalable multi-robot planning and control interfaces, (2) agency-specific route optimization, (3) real-time replanning informed by debrief updates, and (4) vision-assisted cueing that preserves operational trust while reducing cognitive workload. We conclude with design implications for deployable, accountable, and human-centered urban-search support systems

* Under review

Via

Access Paper or Ask Questions

MUSON: A Reasoning-oriented Multimodal Dataset for Socially Compliant Navigation in Urban Environments

Dec 28, 2025

Zhuonan Liu, Xinyu Zhang, Zishuo Wang, Tomohito Kawabata, Xuesu Xiao, Ling Xiao

Abstract:Socially compliant navigation requires structured reasoning over dynamic pedestrians and physical constraints to ensure safe and interpretable decisions. However, existing social navigation datasets often lack explicit reasoning supervision and exhibit highly long-tailed action distributions, limiting models' ability to learn safety-critical behaviors. To address these issues, we introduce MUSON, a multimodal dataset for short-horizon social navigation collected across diverse indoor and outdoor campus scenes. MUSON adopts a structured five-step Chain-of-Thought annotation consisting of perception, prediction, reasoning, action, and explanation, with explicit modeling of static physical constraints and a rationally balanced discrete action space. Compared to SNEI, MUSON provides consistent reasoning, action, and explanation. Benchmarking multiple state-of-the-art Small Vision Language Models on MUSON shows that Qwen2.5-VL-3B achieves the highest decision accuracy of 0.8625, demonstrating that MUSON serves as an effective and reusable benchmark for socially compliant navigation. The dataset is publicly available at https://huggingface.co/datasets/MARSLab/MUSON

Via

Access Paper or Ask Questions

MAction-SocialNav: Multi-Action Socially Compliant Navigation via Reasoning-enhanced Prompt Tuning

Dec 25, 2025

Zishuo Wang, Xinyu Zhang, Zhuonan Liu, Tomohito Kawabata, Daeun Song, Xuesu Xiao, Ling Xiao

Figure 1 for MAction-SocialNav: Multi-Action Socially Compliant Navigation via Reasoning-enhanced Prompt Tuning

Figure 2 for MAction-SocialNav: Multi-Action Socially Compliant Navigation via Reasoning-enhanced Prompt Tuning

Figure 3 for MAction-SocialNav: Multi-Action Socially Compliant Navigation via Reasoning-enhanced Prompt Tuning

Figure 4 for MAction-SocialNav: Multi-Action Socially Compliant Navigation via Reasoning-enhanced Prompt Tuning

Abstract:Socially compliant navigation requires robots to move safely and appropriately in human-centered environments by respecting social norms. However, social norms are often ambiguous, and in a single scenario, multiple actions may be equally acceptable. Most existing methods simplify this problem by assuming a single correct action, which limits their ability to handle real-world social uncertainty. In this work, we propose MAction-SocialNav, an efficient vision language model for socially compliant navigation that explicitly addresses action ambiguity, enabling generating multiple plausible actions within one scenario. To enhance the model's reasoning capability, we introduce a novel meta-cognitive prompt (MCP) method. Furthermore, to evaluate the proposed method, we curate a multi-action socially compliant navigation dataset that accounts for diverse conditions, including crowd density, indoor and outdoor environments, and dual human annotations. The dataset contains 789 samples, each with three-turn conversation, split into 710 training samples and 79 test samples through random selection. We also design five evaluation metrics to assess high-level decision precision, safety, and diversity. Extensive experiments demonstrate that the proposed MAction-SocialNav achieves strong social reasoning performance while maintaining high efficiency, highlighting its potential for real-world human robot navigation. Compared with zero-shot GPT-4o and Claude, our model achieves substantially higher decision quality (APG: 0.595 vs. 0.000/0.025) and safety alignment (ER: 0.264 vs. 0.642/0.668), while maintaining real-time efficiency (1.524 FPS, over 3x faster).

Via

Access Paper or Ask Questions

Learning from Hallucinating Critical Points for Navigation in Dynamic Environments

Sep 30, 2025

Saad Abdul Ghani, Kameron Lee, Xuesu Xiao

Abstract:Generating large and diverse obstacle datasets to learn motion planning in environments with dynamic obstacles is challenging due to the vast space of possible obstacle trajectories. Inspired by hallucination-based data synthesis approaches, we propose Learning from Hallucinating Critical Points (LfH-CP), a self-supervised framework for creating rich dynamic obstacle datasets based on existing optimal motion plans without requiring expensive expert demonstrations or trial-and-error exploration. LfH-CP factorizes hallucination into two stages: first identifying when and where obstacles must appear in order to result in an optimal motion plan, i.e., the critical points, and then procedurally generating diverse trajectories that pass through these points while avoiding collisions. This factorization avoids generative failures such as mode collapse and ensures coverage of diverse dynamic behaviors. We further introduce a diversity metric to quantify dataset richness and show that LfH-CP produces substantially more varied training data than existing baselines. Experiments in simulation demonstrate that planners trained on LfH-CP datasets achieves higher success rates compared to a prior hallucination method.

Via

Access Paper or Ask Questions

Efficient Multi-Agent Coordination via Dynamic Joint-State Graph Construction

Sep 08, 2025

Yanlin Zhou, Manshi Limbu, Xuesu Xiao

Figure 1 for Efficient Multi-Agent Coordination via Dynamic Joint-State Graph Construction

Figure 2 for Efficient Multi-Agent Coordination via Dynamic Joint-State Graph Construction

Figure 3 for Efficient Multi-Agent Coordination via Dynamic Joint-State Graph Construction

Figure 4 for Efficient Multi-Agent Coordination via Dynamic Joint-State Graph Construction

Abstract:Multi-agent pathfinding (MAPF) traditionally focuses on collision avoidance, but many real-world applications require active coordination between agents to improve team performance. This paper introduces Team Coordination on Graphs with Risky Edges (TCGRE), where agents collaborate to reduce traversal costs on high-risk edges via support from teammates. We reformulate TCGRE as a 3D matching problem-mapping robot pairs, support pairs, and time steps-and rigorously prove its NP-hardness via reduction from Minimum 3D Matching. To address this complexity, (in the conference version) we proposed efficient decomposition methods, reducing the problem to tractable subproblems: Joint-State Graph (JSG): Encodes coordination as a single-agent shortest-path problem. Coordination-Exhaustive Search (CES): Optimizes support assignments via exhaustive pairing. Receding-Horizon Optimistic Cooperative A* (RHOCA*): Balances optimality and scalability via horizon-limited planning. Further in this extension, we introduce a dynamic graph construction method (Dynamic-HJSG), leveraging agent homogeneity to prune redundant states and reduce computational overhead by constructing the joint-state graph dynamically. Theoretical analysis shows Dynamic-HJSG preserves optimality while lowering complexity from exponential to polynomial in key cases. Empirical results validate scalability for large teams and graphs, with HJSG outperforming baselines greatly in runtime in different sizes and types of graphs. This work bridges combinatorial optimization and multi-agent planning, offering a principled framework for collaborative pathfinding with provable guarantees, and the key idea of the solution can be widely extended to many other collaborative optimization problems, such as MAPF.

Via

Access Paper or Ask Questions

Verti-Arena: A Controllable and Standardized Indoor Testbed for Multi-Terrain Off-Road Autonomy

Aug 11, 2025

Haiyue Chen, Aniket Datar, Tong Xu, Francesco Cancelliere, Harsh Rangwala, Madhan Balaji Rao, Daeun Song, David Eichinger, Xuesu Xiao

Figure 1 for Verti-Arena: A Controllable and Standardized Indoor Testbed for Multi-Terrain Off-Road Autonomy

Figure 2 for Verti-Arena: A Controllable and Standardized Indoor Testbed for Multi-Terrain Off-Road Autonomy

Figure 3 for Verti-Arena: A Controllable and Standardized Indoor Testbed for Multi-Terrain Off-Road Autonomy

Figure 4 for Verti-Arena: A Controllable and Standardized Indoor Testbed for Multi-Terrain Off-Road Autonomy

Abstract:Off-road navigation is an important capability for mobile robots deployed in environments that are inaccessible or dangerous to humans, such as disaster response or planetary exploration. Progress is limited due to the lack of a controllable and standardized real-world testbed for systematic data collection and validation. To fill this gap, we introduce Verti-Arena, a reconfigurable indoor facility designed specifically for off-road autonomy. By providing a repeatable benchmark environment, Verti-Arena supports reproducible experiments across a variety of vertically challenging terrains and provides precise ground truth measurements through onboard sensors and a motion capture system. Verti-Arena also supports consistent data collection and comparative evaluation of algorithms in off-road autonomy research. We also develop a web-based interface that enables research groups worldwide to remotely conduct standardized off-road autonomy experiments on Verti-Arena.

* 6 pages

Via

Access Paper or Ask Questions

Narrate2Nav: Real-Time Visual Navigation with Implicit Language Reasoning in Human-Centric Environments

Jun 17, 2025

Amirreza Payandeh, Anuj Pokhrel, Daeun Song, Marcos Zampieri, Xuesu Xiao

Abstract:Large Vision-Language Models (VLMs) have demonstrated potential in enhancing mobile robot navigation in human-centric environments by understanding contextual cues, human intentions, and social dynamics while exhibiting reasoning capabilities. However, their computational complexity and limited sensitivity to continuous numerical data impede real-time performance and precise motion control. To this end, we propose Narrate2Nav, a novel real-time vision-action model that leverages a novel self-supervised learning framework based on the Barlow Twins redundancy reduction loss to embed implicit natural language reasoning, social cues, and human intentions within a visual encoder-enabling reasoning in the model's latent space rather than token space. The model combines RGB inputs, motion commands, and textual signals of scene context during training to bridge from robot observations to low-level motion commands for short-horizon point-goal navigation during deployment. Extensive evaluation of Narrate2Nav across various challenging scenarios in both offline unseen dataset and real-world experiments demonstrates an overall improvement of 52.94 percent and 41.67 percent, respectively, over the next best baseline. Additionally, qualitative comparative analysis of Narrate2Nav's visual encoder attention map against four other baselines demonstrates enhanced attention to navigation-critical scene elements, underscoring its effectiveness in human-centric navigation tasks.

Via

Access Paper or Ask Questions

CARoL: Context-aware Adaptation for Robot Learning

Jun 08, 2025

Zechen Hu, Tong Xu, Xuesu Xiao, Xuan Wang

Figure 1 for CARoL: Context-aware Adaptation for Robot Learning

Figure 2 for CARoL: Context-aware Adaptation for Robot Learning

Figure 3 for CARoL: Context-aware Adaptation for Robot Learning

Figure 4 for CARoL: Context-aware Adaptation for Robot Learning

Abstract:Using Reinforcement Learning (RL) to learn new robotic tasks from scratch is often inefficient. Leveraging prior knowledge has the potential to significantly enhance learning efficiency, which, however, raises two critical challenges: how to determine the relevancy of existing knowledge and how to adaptively integrate them into learning a new task. In this paper, we propose Context-aware Adaptation for Robot Learning (CARoL), a novel framework to efficiently learn a similar but distinct new task from prior knowledge. CARoL incorporates context awareness by analyzing state transitions in system dynamics to identify similarities between the new task and prior knowledge. It then utilizes these identified similarities to prioritize and adapt specific knowledge pieces for the new task. Additionally, CARoL has a broad applicability spanning policy-based, value-based, and actor-critic RL algorithms. We validate the efficiency and generalizability of CARoL on both simulated robotic platforms and physical ground vehicles. The simulations include CarRacing and LunarLander environments, where CARoL demonstrates faster convergence and higher rewards when learning policies for new tasks. In real-world experiments, we show that CARoL enables a ground vehicle to quickly and efficiently adapt policies learned in simulation to smoothly traverse real-world off-road terrain.

Via

Access Paper or Ask Questions

McARL:Morphology-Control-Aware Reinforcement Learning for Generalizable Quadrupedal Locomotion

May 23, 2025

Prakhar Mishra, Amir Hossain Raj, Xuesu Xiao, Dinesh Manocha

Abstract:We present Morphology-Control-Aware Reinforcement Learning (McARL), a new approach to overcome challenges of hyperparameter tuning and transfer loss, enabling generalizable locomotion across robot morphologies. We use a morphology-conditioned policy by incorporating a randomized morphology vector, sampled from a defined morphology range, into both the actor and critic networks. This allows the policy to learn parameters that generalize to robots with similar characteristics. We demonstrate that a single policy trained on a Unitree Go1 robot using McARL can be transferred to a different morphology (e.g., Unitree Go2 robot) and can achieve zero-shot transfer velocity of up to 3.5 m/s without retraining or fine-tuning. Moreover, it achieves 6.0 m/s on the training Go1 robot and generalizes to other morphologies like A1 and Mini Cheetah. We also analyze the impact of morphology distance on transfer performance and highlight McARL's advantages over prior approaches. McARL achieves 44-150% higher transfer performance on Go2, Mini Cheetah, and A1 compared to PPO variants.

Via

Access Paper or Ask Questions