Abstract:Underwater salient instance segmentation (USIS) is crucial for marine robotic systems, as it enables both underwater salient object detection and instance-level mask prediction for visual scene understanding. Compared with its terrestrial counterpart, USIS is more challenging due to the underwater image degradation. To address this issue, this paper proposes USIS-PGM, a single-stage framework for USIS. Specifically, the encoder enhances boundary cues through a frequency-aware module and performs content-adaptive feature reweighting via a dynamic weighting module. The decoder incorporates a Transformer-based instance activation module to better distinguish salient instances. In addition, USIS-PGM employs multi-scale Gaussian heatmaps generated from ground-truth masks through Photometric Gaussian Mixture (PGM) to supervise intermediate decoder features, thereby improving salient instance localization and producing more structurally coherent mask predictions. Experimental results demonstrate the superiority and practical applicability of the proposed USIS-PGM model.
Abstract:Pose estimation is essential for robotic manipulation, particularly when visual perception is occluded during gripper-object interactions. Existing tactile-based methods generally rely on tactile simulation or pre-trained models, which limits their generalizability and efficiency. In this study, we propose TacLoc, a novel tactile localization framework that formulates the problem as a one-shot point cloud registration task. TacLoc introduces a graph-theoretic partial-to-full registration method, leveraging dense point clouds and surface normals from tactile sensing for efficient and accurate pose estimation. Without requiring rendered data or pre-trained models, TacLoc achieves improved performance through normal-guided graph pruning and a hypothesis-and-verification pipeline. TacLoc is evaluated extensively on the YCB dataset. We further demonstrate TacLoc on real-world objects across two different visual-tactile sensors.
Abstract:In flood disaster rescue scenarios, partially submerged buildings prevent aerial robots from accessing lower levels, limiting mission effectiveness. To address this challenge, this paper presents Wukong-Omni, a novel multimode robot capable of operating across land, air, and underwater using a unified propulsion system. The system is enabled by an innovative mechanical design that allows motor reuse and improves thrust generation. Efficiency and peak thrust are enhanced through simulation and tank-based optimization. Experimental results show a 100 percent improvement in propulsion efficiency and a 150 percent increase in maximum thrust compared with direct installation methods. Dynamic models for the three operating domains are developed, and a unified cross-domain control framework is proposed. Comprehensive experiments validate stable locomotion and smooth transition across domains. Outdoor experiments further demonstrate robustness and adaptability in real-world environments.
Abstract:The ability to achieve and maintain inverted poses is essential for unlocking the full agility of miniature blimp robots (MBRs). However, developing reliable control methods for MBRs remains challenging due to their complex and underactuated dynamics. To address this challenge, we propose a novel framework that enables robust control policy learning for inverted pose on MBRs. The proposed framework operates through three core stages: First, a high-fidelity three-dimensional (3D) simulation environment was constructed, which was calibrated against real-world MBR motion data to ensure accurate replication of inverted-state dynamics. Second, a robust policy for MBR inverted control was trained within the simulation environment via a domain randomization strategy and a modified Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. Third, a mapping layer was designed to bridge the sim-to-real gap for the learned policy deployment. Comprehensive evaluations in the simulation environment demonstrate that the learned policy achieves a higher success rate compared to the energy-shaping controller. Furthermore, experimental results confirm that the learned policy with a mapping layer enables an MBR to achieve and maintain a fully upside-down pose in real-world settings.




Abstract:External collisions to robot actuators typically pose risks to grasping circular objects. This work presents a vision-based sensing module capable of detecting collisions to maintain stable grasping with a soft gripper system. The system employs an eye-in-palm camera with a broad field of view to simultaneously monitor the motion of fingers and the grasped object. Furthermore, we have developed a collision-rich grasping strategy to ensure the stability and security of the entire dynamic grasping process. A physical soft gripper was manufactured and affixed to a collaborative robotic arm to evaluate the performance of the collision detection mechanism. An experiment regarding testing the response time of the mechanism confirmed the system has the capability to react to the collision instantaneously. A dodging test was conducted to demonstrate the gripper can detect the direction and scale of external collisions precisely.




Abstract:In this study, we present a novel simultaneous localization and mapping (SLAM) system, VIMS, designed for underwater navigation. Conventional visual-inertial state estimators encounter significant practical challenges in perceptually degraded underwater environments, particularly in scale estimation and loop closing. To address these issues, we first propose leveraging a low-cost single-beam sonar to improve scale estimation. Then, VIMS integrates a high-sampling-rate magnetometer for place recognition by utilizing magnetic signatures generated by an economical magnetic field coil. Building on this, a hierarchical scheme is developed for visual-magnetic place recognition, enabling robust loop closure. Furthermore, VIMS achieves a balance between local feature tracking and descriptor-based loop closing, avoiding additional computational burden on the front end. Experimental results highlight the efficacy of the proposed VIMS, demonstrating significant improvements in both the robustness and accuracy of state estimation within underwater environments.
Abstract:Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm and enhancing overall formation stability. Only the leader blimp is manually controlled by a human operator, while follower blimps use onboard monocular cameras and a laser altimeter for relative position and altitude estimation. A leader-switching scheme is proposed to assist the human operator to maintain stability of the swarm, especially when a sharp turn is performed. Experimental results confirm that the leader-switching mechanism effectively maintains stable formations and adapts to dynamic indoor environments while assisting human operator.




Abstract:Existing indoor SLAM datasets primarily focus on robot sensing, often lacking building architectures. To address this gap, we design and construct the first dataset to couple the SLAM and BIM, named SLABIM. This dataset provides BIM and SLAM-oriented sensor data, both modeling a university building at HKUST. The as-designed BIM is decomposed and converted for ease of use. We employ a multi-sensor suite for multi-session data collection and mapping to obtain the as-built model. All the related data are timestamped and organized, enabling users to deploy and test effectively. Furthermore, we deploy advanced methods and report the experimental results on three tasks: registration, localization and semantic mapping, demonstrating the effectiveness and practicality of SLABIM. We make our dataset open-source at https://github.com/HKUST-Aerial-Robotics/SLABIM.




Abstract:Image-goal navigation enables a robot to reach the location where a target image was captured, using visual cues for guidance. However, current methods either rely heavily on data and computationally expensive learning-based approaches or lack efficiency in complex environments due to insufficient exploration strategies. To address these limitations, we propose Bayesian Embodied Image-goal Navigation Using Gaussian Splatting, a novel method that formulates ImageNav as an optimal control problem within a model predictive control framework. BEINGS leverages 3D Gaussian Splatting as a scene prior to predict future observations, enabling efficient, real-time navigation decisions grounded in the robot's sensory experiences. By integrating Bayesian updates, our method dynamically refines the robot's strategy without requiring extensive prior experience or data. Our algorithm is validated through extensive simulations and physical experiments, showcasing its potential for embodied robot systems in visually complex scenarios.




Abstract:This paper introduces a novel solution to the manual control challenge for indoor blimps. The problem's complexity arises from the conflicting demands of executing human commands while maintaining stability through automatic control for underactuated robots. To tackle this challenge, we introduced an assisted piloting hybrid controller with a preemptive mechanism, that seamlessly switches between executing human commands and activating automatic stabilization control. Our algorithm ensures that the automatic stabilization controller operates within the time delay between human observation and perception, providing assistance to the driver in a way that remains imperceptible.