Multiview registration is used to estimate Rigid Body Transformations (RBTs) from multiple frames and reconstruct a scene with corresponding scans. Despite the success of pairwise registration and pose synchronization, the concept of Bundle Adjustment (BA) has been proven to better maintain global consistency. So in this work, we make the multiview point-cloud registration more tractable from a different perspective in resolving range-based BA. Based on this analysis, we propose an objective function that takes both measurement noises and computational cost into account. For the feature parameter update, instead of calculating the global distribution parameters from the raw measurements, we aggregate the local distributions upon the pose update at each iteration. The computational cost of feature update is then only dependent on the number of scans. Finally, we develop a multiview registration system using voxel-based quantization that can be applied in real-world scenarios. The experimental results demonstrate our superiority over the baselines in terms of both accuracy and speed. Moreover, the results also show that our average positioning errors achieve the centimeter level.
Traditional approaches for Visual Simultaneous Localization and Mapping (VSLAM) rely on low-level vision information for state estimation, such as handcrafted local features or the image gradient. While significant progress has been made through this track, under more challenging configuration for monocular VSLAM, e.g., varying illumination, the performance of state-of-the-art systems generally degrades. As a consequence, robustness and accuracy for monocular VSLAM are still widely concerned. This paper presents a monocular VSLAM system that fully exploits learnt features for better state estimation. The proposed system leverages both learnt local features and global embeddings at different modules of the system: direct camera pose estimation, inter-frame feature association, and loop closure detection. With a probabilistic explanation of keypoint prediction, we formulate the camera pose tracking in a direct manner and parameterize local features with uncertainty taken into account. To alleviate the quantization effect, we adapt the mapping module to generate 3D landmarks better to guarantee the system's robustness. Detecting temporal loop closure via deep global embeddings further improves the robustness and accuracy of the proposed system. The proposed system is extensively evaluated on public datasets (Tsukuba, EuRoC, and KITTI), and compared against the state-of-the-art methods. The competitive performance of camera pose estimation confirms the effectiveness of our method.
Freespace detection is a fundamental component of autonomous driving perception. Recently, deep convolutional neural networks (DCNNs) have achieved impressive performance for this task. In particular, SNE-RoadSeg, our previously proposed method based on a surface normal estimator (SNE) and a data-fusion DCNN (RoadSeg), has achieved impressive performance in freespace detection. However, SNE-RoadSeg is computationally intensive, and it is difficult to execute in real time. To address this problem, we introduce SNE-RoadSeg+, an upgraded version of SNE-RoadSeg. SNE-RoadSeg+ consists of 1) SNE+, a module for more accurate surface normal estimation, and 2) RoadSeg+, a data-fusion DCNN that can greatly minimize the trade-off between accuracy and efficiency with the use of deep supervision. Extensive experimental results have demonstrated the effectiveness of our SNE+ for surface normal estimation and the superior performance of our SNE-RoadSeg+ over all other freespace detection approaches. Specifically, our SNE-RoadSeg+ runs in real time, and meanwhile, achieves the state-of-the-art performance on the KITTI road benchmark. Our project page is at https://www.sne-roadseg.site/sne-roadseg-plus.
Federated Learning aims to learn machine learning models from multiple decentralized edge devices (e.g. mobiles) or servers without sacrificing local data privacy. Recent Natural Language Processing techniques rely on deep learning and large pre-trained language models. However, both big deep neural and language models are trained with huge amounts of data which often lies on the server side. Since text data is widely originated from end users, in this work, we look into recent NLP models and techniques which use federated learning as the learning framework. Our survey discusses major challenges in federated natural language processing, including the algorithm challenges, system challenges as well as the privacy issues. We also provide a critical review of the existing Federated NLP evaluation methods and tools. Finally, we highlight the current research gaps and future directions.
Road curb detection is important for autonomous driving. It can be used to determine road boundaries to constrain vehicles on roads, so that potential accidents could be avoided. Most of the current methods detect road curbs online using vehicle-mounted sensors, such as cameras or 3-D Lidars. However, these methods usually suffer from severe occlusion issues. Especially in highly-dynamic traffic environments, most of the field of view is occupied by dynamic objects. To alleviate this issue, we detect road curbs offline using high-resolution aerial images in this paper. Moreover, the detected road curbs can be used to create high-definition (HD) maps for autonomous vehicles. Specifically, we first predict the pixel-wise segmentation map of road curbs, and then conduct a series of post-processing steps to extract the graph structure of road curbs. To tackle the disconnectivity issue in the segmentation maps, we propose an innovative connectivity-preserving loss (CP-loss) to improve the segmentation performance. The experimental results on a public dataset demonstrate the effectiveness of our proposed loss function. This paper is accompanied with a demonstration video and a supplementary document, which are available at \texttt{\url{https://sites.google.com/view/cp-loss}}.
In this paper, we propose a novel training strategy for convolutional neural network(CNN) named Feature Mining, that aims to strengthen the network's learning of the local feature. Through experiments, we find that semantic contained in different parts of the feature is different, while the network will inevitably lose the local information during feedforward propagation. In order to enhance the learning of local feature, Feature Mining divides the complete feature into two complementary parts and reuse these divided feature to make the network learn more local information, we call the two steps as feature segmentation and feature reusing. Feature Mining is a parameter-free method and has plug-and-play nature, and can be applied to any CNN models. Extensive experiments demonstrate the wide applicability, versatility, and compatibility of our method.
Autonomous car racing is a challenging task in the robotic control area. Traditional modular methods require accurate mapping, localization and planning, which makes them computationally inefficient and sensitive to environmental changes. Recently, deep-learning-based end-to-end systems have shown promising results for autonomous driving/racing. However, they are commonly implemented by supervised imitation learning (IL), which suffers from the distribution mismatch problem, or by reinforcement learning (RL), which requires a huge amount of risky interaction data. In this work, we present a general deep imitative reinforcement learning approach (DIRL), which successfully achieves agile autonomous racing using visual inputs. The driving knowledge is acquired from both IL and model-based RL, where the agent can learn from human teachers as well as perform self-improvement by safely interacting with an offline world model. We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation. The evaluation results demonstrate that our method outperforms previous IL and RL methods in terms of sample efficiency and task performance. Demonstration videos are available at https://caipeide.github.io/autorace-dirl/
Convolutional neural network (CNN)-based stereo matching approaches generally require a dense cost volume (DCV) for disparity estimation. However, generating such cost volumes is computationally-intensive and memory-consuming, hindering CNN training and inference efficiency. To address this problem, we propose SCV-Stereo, a novel CNN architecture, capable of learning dense stereo matching from sparse cost volume (SCV) representations. Our inspiration is derived from the fact that DCV representations are somewhat redundant and can be replaced with SCV representations. Benefiting from these SCV representations, our SCV-Stereo can update disparity estimations in an iterative fashion for accurate and efficient stereo matching. Extensive experiments carried out on the KITTI Stereo benchmarks demonstrate that our SCV-Stereo can significantly minimize the trade-off between accuracy and efficiency for stereo matching. Our project page is https://sites.google.com/view/scv-stereo.
Stereo matching is a key component of autonomous driving perception. Recent unsupervised stereo matching approaches have received adequate attention due to their advantage of not requiring disparity ground truth. These approaches, however, perform poorly near occlusions. To overcome this drawback, in this paper, we propose CoT-Stereo, a novel unsupervised stereo matching approach. Specifically, we adopt a co-teaching framework where two networks interactively teach each other about the occlusions in an unsupervised fashion, which greatly improves the robustness of unsupervised stereo matching. Extensive experiments on the KITTI Stereo benchmarks demonstrate the superior performance of CoT-Stereo over all other state-of-the-art unsupervised stereo matching approaches in terms of both accuracy and speed. Our project webpage is https://sites.google.com/view/cot-stereo.
Regularization and data augmentation methods have been widely used and become increasingly indispensable in deep learning training. Researchers who devote themselves to this have considered various possibilities. But so far, there has been little discussion about regularizing outputs of the model. This paper begins with empirical observations that better performances are significantly associated with output distributions, that have smaller average values and variances. By audaciously assuming there is causality involved, we propose a novel regularization term, called Output Decay, that enforces the model to assign smaller and similar output values on each class. Though being counter-intuitive, such a small modification result in a remarkable improvement on performance. Extensive experiments demonstrate the wide applicability, versatility, and compatibility of Output Decay.