Abstract:In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However, we have encountered challenges in implementing recent innovative frameworks when handling service robots with low-end processors and insufficient sensor data, such as low-resolution 2D LiDAR sensors. Specifically, regarding commercial robots, consistent performance in different hardware configurations and environments is more crucial than the performance dedicated to specific sensors or environments. Therefore, we propose a) a multi-stage %hierarchical approach for global pose estimation in embedded systems; b) a graph generation method with zero constraints for synchronized sensors; and c) a robust and memory-efficient method for long-term pose-graph optimization. As verified in in-home and large-scale indoor environments, the proposed method yields consistent global pose estimation for services in commercial fields. Furthermore, the proposed method exhibits potential commercial viability considering the consistent performance verified via mass production and long-term (> 5 years) operation.
Abstract:Recognizing traversable terrain from 3D point cloud data is critical, as it directly impacts the performance of autonomous navigation in off-road environments. However, existing segmentation algorithms often struggle with challenges related to changes in data distribution, environmental specificity, and sensor variations. Moreover, when encountering sunken areas, their performance is frequently compromised, and they may even fail to recognize them. To address these challenges, we introduce B-TMS, a novel approach that performs map-wise terrain modeling and segmentation by utilizing Bayesian generalized kernel (BGK) within the graph structure known as the tri-grid field (TGF). Our experiments encompass various data distributions, ranging from single scans to partial maps, utilizing both public datasets representing urban scenes and off-road environments, and our own dataset acquired from extremely bumpy terrains. Our results demonstrate notable contributions, particularly in terms of robustness to data distribution variations, adaptability to diverse environmental conditions, and resilience against the challenges associated with parameter changes.
Abstract:With the rapid development of autonomous driving and SLAM technology, the performance of autonomous systems using multimodal sensors highly relies on accurate extrinsic calibration. Addressing the need for a convenient, maintenance-friendly calibration process in any natural environment, this paper introduces Galibr, a fully automatic targetless LiDAR-camera extrinsic calibration tool designed for ground vehicle platforms in any natural setting. The method utilizes the ground planes and edge information from both LiDAR and camera inputs, streamlining the calibration process. It encompasses two main steps: an initial pose estimation algorithm based on ground planes (GP-init), and a refinement phase through edge extraction and matching. Our approach significantly enhances calibration performance, primarily attributed to our novel initial pose estimation method, as demonstrated in unstructured natural environments, including on the KITTI dataset and the KAIST quadruped dataset.
Abstract:Despite great improvements in semantic segmentation, challenges persist because of the lack of local/global contexts and the relationship between them. In this paper, we propose Contextrast, a contrastive learning-based semantic segmentation method that allows to capture local/global contexts and comprehend their relationships. Our proposed method comprises two parts: a) contextual contrastive learning (CCL) and b) boundary-aware negative (BANE) sampling. Contextual contrastive learning obtains local/global context from multi-scale feature aggregation and inter/intra-relationship of features for better discrimination capabilities. Meanwhile, BANE sampling selects embedding features along the boundaries of incorrectly predicted regions to employ them as harder negative samples on our contrastive learning, resolving segmentation issues along the boundary region by exploiting fine-grained details. We demonstrate that our Contextrast substantially enhances the performance of semantic segmentation networks, outperforming state-of-the-art contrastive learning approaches on diverse public datasets, e.g. Cityscapes, CamVid, PASCAL-C, COCO-Stuff, and ADE20K, without an increase in computational cost during inference.
Abstract:Deep neural networks have shown remarkable performance in image classification. However, their performance significantly deteriorates with corrupted input data. Domain generalization methods have been proposed to train robust models against out-of-distribution data. Data augmentation in the frequency domain is one of such approaches that enable a model to learn phase features to establish domain-invariant representations. This approach changes the amplitudes of the input data while preserving the phases. However, using fixed phases leads to susceptibility to phase fluctuations because amplitudes and phase fluctuations commonly occur in out-of-distribution. In this study, to address this problem, we introduce an approach using finite variation of the phases of input data rather than maintaining fixed phases. Based on the assumption that the degree of domain-invariant features varies for each phase, we propose a method to distinguish phases based on this degree. In addition, we propose a method called vital phase augmentation (VIPAug) that applies the variation to the phases differently according to the degree of domain-invariant features of given phases. The model depends more on the vital phases that contain more domain-invariant features for attaining robustness to amplitude and phase fluctuations. We present experimental evaluations of our proposed approach, which exhibited improved performance for both clean and corrupted data. VIPAug achieved SOTA performance on the benchmark CIFAR-10 and CIFAR-100 datasets, as well as near-SOTA performance on the ImageNet-100 and ImageNet datasets. Our code is available at https://github.com/excitedkid/vipaug.
Abstract:With the increasing demand for mobile robots and autonomous vehicles, several approaches for long-term robot navigation have been proposed. Among these techniques, ground segmentation and traversability estimation play important roles in perception and path planning, respectively. Even though these two techniques appear similar, their objectives are different. Ground segmentation divides data into ground and non-ground elements; thus, it is used as a preprocessing stage to extract objects of interest by rejecting ground points. In contrast, traversability estimation identifies and comprehends areas in which robots can move safely. Nevertheless, some researchers use these terms without clear distinction, leading to misunderstanding the two concepts. Therefore, in this study, we survey related literature and clearly distinguish ground and traversable regions considering four aspects: a) maneuverability of robot platforms, b) position of a robot in the surroundings, c) subset relation of negative obstacles, and d) subset relation of deformable objects.
Abstract:Single-domain generalization (S-DG) aims to generalize a model to unseen environments with a single-source domain. However, most S-DG approaches have been conducted in the field of classification. When these approaches are applied to object detection, the semantic features of some objects can be damaged, which can lead to imprecise object localization and misclassification. To address these problems, we propose an object-aware domain generalization (OA-DG) method for single-domain generalization in object detection. Our method consists of data augmentation and training strategy, which are called OA-Mix and OA-Loss, respectively. OA-Mix generates multi-domain data with multi-level transformation and object-aware mixing strategy. OA-Loss enables models to learn domain-invariant representations for objects and backgrounds from the original and OA-Mixed images. Our proposed method outperforms state-of-the-art works on standard benchmarks. Our code is available at https://github.com/WoojuLee24/OA-DG.
Abstract:Global registration is a fundamental task that estimates the relative pose between two viewpoints of 3D point clouds. However, there are two issues that degrade the performance of global registration in LiDAR SLAM: one is the sparsity issue and the other is degeneracy. The sparsity issue is caused by the sparse characteristics of the 3D point cloud measurements in a mechanically spinning LiDAR sensor. The degeneracy issue sometimes occurs because the outlier-rejection methods reject too many correspondences, leaving less than three inliers. These two issues have become more severe as the pose discrepancy between the two viewpoints of 3D point clouds becomes greater. To tackle these problems, we propose a robust global registration framework, called \textit{Quatro++}. Extending our previous work that solely focused on the global registration itself, we address the robust global registration in terms of the loop closing in LiDAR SLAM. To this end, ground segmentation is exploited to achieve robust global registration. Through the experiments, we demonstrate that our proposed method shows a higher success rate than the state-of-the-art global registration methods, overcoming the sparsity and degeneracy issues. In addition, we show that ground segmentation significantly helps to increase the success rate for the ground vehicles. Finally, we apply our proposed method to the loop closing module in LiDAR SLAM and confirm that the quality of the loop constraints is improved, showing more precise mapping results. Therefore, the experimental evidence corroborated the suitability of our method as an initial alignment in the loop closing. Our code is available at https://quatro-plusplus.github.io.
Abstract:Quadrupedal robots have emerged as a cutting-edge platform for assisting humans, finding applications in tasks related to inspection and exploration in remote areas. Nevertheless, their floating base structure renders them susceptible to fall in cluttered environments, where manual recovery by a human operator may not always be feasible. Several recent studies have presented recovery controllers employing deep reinforcement learning algorithms. However, these controllers are not specifically designed to operate effectively in cluttered environments, such as stairs and slopes, which restricts their applicability. In this study, we propose a robust all-terrain recovery policy to facilitate rapid and secure recovery in cluttered environments. We substantiate the superiority of our proposed approach through simulations and real-world tests encompassing various terrain types.
Abstract:In recent years, the demand for mapping construction sites or buildings using light detection and ranging~(LiDAR) sensors has been increased to model environments for efficient site management. However, it is observed that sometimes LiDAR-based approaches diverge in narrow and confined environments, such as spiral stairs and corridors, caused by fixed parameters regardless of the changes in the environments. That is, the parameters of LiDAR (-inertial) odometry are mostly set for open space; thus, if the same parameters suitable for the open space are applied in a corridor-like scene, it results in divergence of odometry methods, which is referred to as \textit{degeneracy}. To tackle this degeneracy problem, we propose a robust LiDAR inertial odometry called \textit{AdaLIO}, which employs an adaptive parameter setting strategy. To this end, we first check the degeneracy by checking whether the surroundings are corridor-like environments. If so, the parameters relevant to voxelization and normal vector estimation are adaptively changed to increase the number of correspondences. As verified in a public dataset, our proposed method showed promising performance in narrow and cramped environments, avoiding the degeneracy problem.