Despite great improvements in semantic segmentation, challenges persist because of the lack of local/global contexts and the relationship between them. In this paper, we propose Contextrast, a contrastive learning-based semantic segmentation method that allows to capture local/global contexts and comprehend their relationships. Our proposed method comprises two parts: a) contextual contrastive learning (CCL) and b) boundary-aware negative (BANE) sampling. Contextual contrastive learning obtains local/global context from multi-scale feature aggregation and inter/intra-relationship of features for better discrimination capabilities. Meanwhile, BANE sampling selects embedding features along the boundaries of incorrectly predicted regions to employ them as harder negative samples on our contrastive learning, resolving segmentation issues along the boundary region by exploiting fine-grained details. We demonstrate that our Contextrast substantially enhances the performance of semantic segmentation networks, outperforming state-of-the-art contrastive learning approaches on diverse public datasets, e.g. Cityscapes, CamVid, PASCAL-C, COCO-Stuff, and ADE20K, without an increase in computational cost during inference.
With the increasing demand for mobile robots and autonomous vehicles, several approaches for long-term robot navigation have been proposed. Among these techniques, ground segmentation and traversability estimation play important roles in perception and path planning, respectively. Even though these two techniques appear similar, their objectives are different. Ground segmentation divides data into ground and non-ground elements; thus, it is used as a preprocessing stage to extract objects of interest by rejecting ground points. In contrast, traversability estimation identifies and comprehends areas in which robots can move safely. Nevertheless, some researchers use these terms without clear distinction, leading to misunderstanding the two concepts. Therefore, in this study, we survey related literature and clearly distinguish ground and traversable regions considering four aspects: a) maneuverability of robot platforms, b) position of a robot in the surroundings, c) subset relation of negative obstacles, and d) subset relation of deformable objects.
Single-domain generalization (S-DG) aims to generalize a model to unseen environments with a single-source domain. However, most S-DG approaches have been conducted in the field of classification. When these approaches are applied to object detection, the semantic features of some objects can be damaged, which can lead to imprecise object localization and misclassification. To address these problems, we propose an object-aware domain generalization (OA-DG) method for single-domain generalization in object detection. Our method consists of data augmentation and training strategy, which are called OA-Mix and OA-Loss, respectively. OA-Mix generates multi-domain data with multi-level transformation and object-aware mixing strategy. OA-Loss enables models to learn domain-invariant representations for objects and backgrounds from the original and OA-Mixed images. Our proposed method outperforms state-of-the-art works on standard benchmarks. Our code is available at https://github.com/WoojuLee24/OA-DG.
Global registration is a fundamental task that estimates the relative pose between two viewpoints of 3D point clouds. However, there are two issues that degrade the performance of global registration in LiDAR SLAM: one is the sparsity issue and the other is degeneracy. The sparsity issue is caused by the sparse characteristics of the 3D point cloud measurements in a mechanically spinning LiDAR sensor. The degeneracy issue sometimes occurs because the outlier-rejection methods reject too many correspondences, leaving less than three inliers. These two issues have become more severe as the pose discrepancy between the two viewpoints of 3D point clouds becomes greater. To tackle these problems, we propose a robust global registration framework, called \textit{Quatro++}. Extending our previous work that solely focused on the global registration itself, we address the robust global registration in terms of the loop closing in LiDAR SLAM. To this end, ground segmentation is exploited to achieve robust global registration. Through the experiments, we demonstrate that our proposed method shows a higher success rate than the state-of-the-art global registration methods, overcoming the sparsity and degeneracy issues. In addition, we show that ground segmentation significantly helps to increase the success rate for the ground vehicles. Finally, we apply our proposed method to the loop closing module in LiDAR SLAM and confirm that the quality of the loop constraints is improved, showing more precise mapping results. Therefore, the experimental evidence corroborated the suitability of our method as an initial alignment in the loop closing. Our code is available at https://quatro-plusplus.github.io.
Quadrupedal robots have emerged as a cutting-edge platform for assisting humans, finding applications in tasks related to inspection and exploration in remote areas. Nevertheless, their floating base structure renders them susceptible to fall in cluttered environments, where manual recovery by a human operator may not always be feasible. Several recent studies have presented recovery controllers employing deep reinforcement learning algorithms. However, these controllers are not specifically designed to operate effectively in cluttered environments, such as stairs and slopes, which restricts their applicability. In this study, we propose a robust all-terrain recovery policy to facilitate rapid and secure recovery in cluttered environments. We substantiate the superiority of our proposed approach through simulations and real-world tests encompassing various terrain types.
In recent years, the demand for mapping construction sites or buildings using light detection and ranging~(LiDAR) sensors has been increased to model environments for efficient site management. However, it is observed that sometimes LiDAR-based approaches diverge in narrow and confined environments, such as spiral stairs and corridors, caused by fixed parameters regardless of the changes in the environments. That is, the parameters of LiDAR (-inertial) odometry are mostly set for open space; thus, if the same parameters suitable for the open space are applied in a corridor-like scene, it results in divergence of odometry methods, which is referred to as \textit{degeneracy}. To tackle this degeneracy problem, we propose a robust LiDAR inertial odometry called \textit{AdaLIO}, which employs an adaptive parameter setting strategy. To this end, we first check the degeneracy by checking whether the surroundings are corridor-like environments. If so, the parameters relevant to voxelization and normal vector estimation are adaptively changed to increase the number of correspondences. As verified in a public dataset, our proposed method showed promising performance in narrow and cramped environments, avoiding the degeneracy problem.
Localization has been a challenging task for autonomous navigation. A loop detection algorithm must overcome environmental changes for the place recognition and re-localization of robots. Therefore, deep learning has been extensively studied for the consistent transformation of measurements into localization descriptors. Street view images are easily accessible; however, images are vulnerable to appearance changes. LiDAR can robustly provide precise structural information. However, constructing a point cloud database is expensive, and point clouds exist only in limited places. Different from previous works that train networks to produce shared embedding directly between the 2D image and 3D point cloud, we transform both data into 2.5D depth images for matching. In this work, we propose a novel cross-matching method, called (LC)$^2$, for achieving LiDAR localization without a prior point cloud map. To this end, LiDAR measurements are expressed in the form of range images before matching them to reduce the modality discrepancy. Subsequently, the network is trained to extract localization descriptors from disparity and range images. Next, the best matches are employed as a loop factor in a pose graph. Using public datasets that include multiple sessions in significantly different lighting conditions, we demonstrated that LiDAR-based navigation systems could be optimized from image databases and vice versa.
Radar sensors are emerging as solutions for perceiving surroundings and estimating ego-motion in extreme weather conditions. Unfortunately, radar measurements are noisy and suffer from mutual interference, which degrades the performance of feature extraction and matching, triggering imprecise matching pairs, which are referred to as outliers. To tackle the effect of outliers on radar odometry, a novel outlier-robust method called \textit{ORORA} is proposed, which is an abbreviation of \textit{Outlier-RObust RAdar odometry}. To this end, a novel decoupling-based method is proposed, which consists of graduated non-convexity~(GNC)-based rotation estimation and anisotropic component-wise translation estimation~(A-COTE). Furthermore, our method leverages the anisotropic characteristics of radar measurements, each of whose uncertainty along the azimuthal direction is somewhat larger than that along the radial direction. As verified in the public dataset, it was demonstrated that our proposed method yields robust ego-motion estimation performance compared with other state-of-the-art methods. Our code is available at https://github.com/url-kaist/outlier-robust-radar-odometry.
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
Visual inertial odometry and SLAM algorithms are widely used in various fields, such as service robots, drones, and autonomous vehicles. Most of the SLAM algorithms are based on assumption that landmarks are static. However, in the real-world, various dynamic objects exist, and they degrade the pose estimation accuracy. In addition, temporarily static objects, which are static during observation but move when they are out of sight, trigger false positive loop closings. To overcome these problems, we propose a novel visual-inertial SLAM framework, called DynaVINS, which is robust against both dynamic objects and temporarily static objects. In our framework, we first present a robust bundle adjustment that could reject the features from dynamic objects by leveraging pose priors estimated by the IMU preintegration. Then, a keyframe grouping and a multi-hypothesis-based constraints grouping methods are proposed to reduce the effect of temporarily static objects in the loop closing. Subsequently, we evaluated our method in a public dataset that contains numerous dynamic objects. Finally, the experimental results corroborate that our DynaVINS has promising performance compared with other state-of-the-art methods by successfully rejecting the effect of dynamic and temporarily static objects. Our code is available at https://github.com/url-kaist/dynaVINS.