Abstract:3D Gaussian splatting (3DGS) is an innovative rendering technique that surpasses the neural radiance field (NeRF) in both rendering speed and visual quality by leveraging an explicit 3D scene representation. Existing 3DGS approaches require a large number of calibrated views to generate a consistent and complete scene representation. When input views are limited, 3DGS tends to overfit the training views, leading to noticeable degradation in rendering quality. To address this limitation, we propose a Point-wise Feature-Aware Gaussian Splatting framework that enables real-time, high-quality rendering from sparse training views. Specifically, we first employ the latest stereo foundation model to estimate accurate camera poses and reconstruct a dense point cloud for Gaussian initialization. We then encode the colour attributes of each 3D Gaussian by sampling and aggregating multiscale 2D appearance features from sparse inputs. To enhance point-wise appearance representation, we design a point interaction network based on a self-attention mechanism, allowing each Gaussian point to interact with its nearest neighbors. These enriched features are subsequently decoded into Gaussian parameters through two lightweight multi-layer perceptrons (MLPs) for final rendering. Extensive experiments on diverse benchmarks demonstrate that our method significantly outperforms NeRF-based approaches and achieves competitive performance under few-shot settings compared to the state-of-the-art 3DGS methods.
Abstract:Neural implicit surface reconstruction using volume rendering techniques has recently achieved significant advancements in creating high-fidelity surfaces from multiple 2D images. However, current methods primarily target scenes with consistent illumination and struggle to accurately reconstruct 3D geometry in uncontrolled environments with transient occlusions or varying appearances. While some neural radiance field (NeRF)-based variants can better manage photometric variations and transient objects in complex scenes, they are designed for novel view synthesis rather than precise surface reconstruction due to limited surface constraints. To overcome this limitation, we introduce a novel approach that applies multiple geometric constraints to the implicit surface optimization process, enabling more accurate reconstructions from unconstrained image collections. First, we utilize sparse 3D points from structure-from-motion (SfM) to refine the signed distance function estimation for the reconstructed surface, with a displacement compensation to accommodate noise in the sparse points. Additionally, we employ robust normal priors derived from a normal predictor, enhanced by edge prior filtering and multi-view consistency constraints, to improve alignment with the actual surface geometry. Extensive testing on the Heritage-Recon benchmark and other datasets has shown that the proposed method can accurately reconstruct surfaces from in-the-wild images, yielding geometries with superior accuracy and granularity compared to existing techniques. Our approach enables high-quality 3D reconstruction of various landmarks, making it applicable to diverse scenarios such as digital preservation of cultural heritage sites.
Abstract:The spatial attention mechanism captures long-range dependencies by aggregating global contextual information to each query location, which is beneficial for semantic segmentation. In this paper, we present a sparse spatial attention network (SSANet) to improve the efficiency of the spatial attention mechanism without sacrificing the performance. Specifically, a sparse non-local (SNL) block is proposed to sample a subset of key and value elements for each query element to capture long-range relations adaptively and generate a sparse affinity matrix to aggregate contextual information efficiently. Experimental results show that the proposed approach outperforms other context aggregation methods and achieves state-of-the-art performance on the Cityscapes, PASCAL Context and ADE20K datasets.
Abstract:Graph convolutional networks (GCNs) have achieved great success in dealing with data of non-Euclidean structures. Their success directly attributes to fitting graph structures effectively to data such as in social media and knowledge databases. For image processing applications, the use of graph structures and GCNs have not been fully explored. In this paper, we propose a novel encoder-decoder network with added graph convolutions by converting feature maps to vertexes of a pre-generated graph to synthetically construct graph-structured data. By doing this, we inexplicitly apply graph Laplacian regularization to the feature maps, making them more structured. The experiments show that it significantly boosts performance for image restoration tasks, including deblurring and super-resolution. We believe it opens up opportunities for GCN-based approaches in more applications.
Abstract:In recent years, Unmanned Surface Vehicles (USV) have been extensively deployed for maritime applications. However, USV has a limited detection range with sensor installed at the same elevation with the targets. In this research, we propose a cooperative Unmanned Aerial Vehicle - Unmanned Surface Vehicle (UAV-USV) platform to improve the detection range of USV. A floatable and waterproof UAV is designed and 3D printed, which allows it to land on the sea. A catamaran USV and landing platform are also developed. To land UAV on the USV precisely in various lighting conditions, IR beacon detector and IR beacon are implemented on the UAV and USV, respectively. Finally, a two-phase UAV precise landing method, USV control algorithm and USV path following algorithm are proposed and tested.
Abstract:This paper presents the development of a control system for vision-guided pick-and-place tasks using a robot arm equipped with a 3D camera. The main steps include camera intrinsic and extrinsic calibration, hand-eye calibration, initial object pose registration, objects pose alignment algorithm, and pick-and-place execution. The proposed system allows the robot be able to to pick and place object with limited times of registering a new object and the developed software can be applied for new object scenario quickly. The integrated system was tested using the hardware combination of kuka iiwa, Robotiq grippers (two finger gripper and three finger gripper) and 3D cameras (Intel realsense D415 camera, Intel realsense D435 camera, Microsoft Kinect V2). The whole system can also be modified for the combination of other robotic arm, gripper and 3D camera.
Abstract:This paper presents a sensor-level mapless collision avoidance algorithm for use in mobile robots that map raw sensor data to linear and angular velocities and navigate in an unknown environment without a map. An efficient training strategy is proposed to allow a robot to learn from both human experience data and self-exploratory data. A game format simulation framework is designed to allow the human player to tele-operate the mobile robot to a goal and human action is also scored using the reward function. Both human player data and self-playing data are sampled using prioritized experience replay algorithm. The proposed algorithm and training strategy have been evaluated in two different experimental configurations: \textit{Environment 1}, a simulated cluttered environment, and \textit{Environment 2}, a simulated corridor environment, to investigate the performance. It was demonstrated that the proposed method achieved the same level of reward using only 16\% of the training steps required by the standard Deep Deterministic Policy Gradient (DDPG) method in Environment 1 and 20\% of that in Environment 2. In the evaluation of 20 random missions, the proposed method achieved no collision in less than 2~h and 2.5~h of training time in the two Gazebo environments respectively. The method also generated smoother trajectories than DDPG. The proposed method has also been implemented on a real robot in the real-world environment for performance evaluation. We can confirm that the trained model with the simulation software can be directly applied into the real-world scenario without further fine-tuning, further demonstrating its higher robustness than DDPG. The video and code are available: https://youtu.be/BmwxevgsdGc https://github.com/hanlinniu/turtlebot3_ddpg_collision_avoidance
Abstract:Timely disaster risk management requires accurate road maps and prompt damage assessment. Currently, this is done by volunteers manually marking satellite imagery of affected areas but this process is slow and often error-prone. Segmentation algorithms can be applied to satellite images to detect road networks. However, existing methods are unsuitable for disaster-struck areas as they make assumptions about the road network topology which may no longer be valid in these scenarios. Herein, we propose a CNN-based framework for identifying accessible roads in post-disaster imagery by detecting changes from pre-disaster imagery. Graph theory is combined with the CNN output for detecting semantic changes in road networks with OpenStreetMap data. Our results are validated with data of a tsunami-affected region in Palu, Indonesia acquired from DigitalGlobe.
Abstract:Satellite images are an extremely valuable resource in the aftermath of natural disasters such as hurricanes and tsunamis where they can be used for risk assessment and disaster management. In order to provide timely and actionable information for disaster response, in this paper a framework utilising segmentation neural networks is proposed to identify impacted areas and accessible roads in post-disaster scenarios. The effectiveness of pretraining with ImageNet on the task of aerial image segmentation has been analysed and performances of popular segmentation models compared. Experimental results show that pretraining on ImageNet usually improves the segmentation performance for a number of models. Open data available from OpenStreetMap (OSM) is used for training, forgoing the need for time-consuming manual annotation. The method also makes use of graph theory to update road network data available from OSM and to detect the changes caused by a natural disaster. Extensive experiments on data from the 2018 tsunami that struck Palu, Indonesia show the effectiveness of the proposed framework. ENetSeparable, with 30% fewer parameters compared to ENet, achieved comparable segmentation results to that of the state-of-the-art networks.
Abstract:LiDAR provides highly accurate 3D point clouds. However, data needs to be manually labelled in order to provide subsequent useful information. Manual annotation of such data is time consuming, tedious and error prone, and hence in this paper we present three automatic methods for annotating trees in LiDAR data. The first method requires high density point clouds and uses certain LiDAR data attributes for the purpose of tree identification, achieving almost 90% accuracy. The second method uses a voxel-based 3D Convolutional Neural Network on low density LiDAR datasets and is able to identify most large trees accurately but struggles with smaller ones due to the voxelisation process. The third method is a scaled version of the PointNet++ method and works directly on outdoor point clouds and achieves an F_score of 82.1% on the ISPRS benchmark dataset, comparable to the state-of-the-art methods but with increased efficiency.