Generating safe and non-conservative behaviors in dense, dynamic environments remains challenging for automated vehicles due to the stochastic nature of traffic participants' behaviors and their implicit interaction with the ego vehicle. This paper presents a novel planning framework, Multipolicy And Risk-aware Contingency planning (MARC), that systematically addresses these challenges by enhancing the multipolicy-based pipelines from both behavior and motion planning aspects. Specifically, MARC realizes a critical scenario set that reflects multiple possible futures conditioned on each semantic-level ego policy. Then, the generated policy-conditioned scenarios are further formulated into a tree-structured representation with a dynamic branchpoint based on the scene-level divergence. Moreover, to generate diverse driving maneuvers, we introduce risk-aware contingency planning, a bi-level optimization algorithm that simultaneously considers multiple future scenarios and user-defined risk tolerance levels. Owing to the more unified combination of behavior and motion planning layers, our framework achieves efficient decision-making and human-like driving maneuvers. Comprehensive experimental results demonstrate superior performance to other strong baselines in various environments.
Predicting the future behavior of agents is a fundamental task in autonomous vehicle domains. Accurate prediction relies on comprehending the surrounding map, which significantly regularizes agent behaviors. However, existing methods have limitations in exploiting the map and exhibit a strong dependence on historical trajectories, which yield unsatisfactory prediction performance and robustness. Additionally, their heavy network architectures impede real-time applications. To tackle these problems, we propose Map-Agent Coupled Transformer (MacFormer) for real-time and robust trajectory prediction. Our framework explicitly incorporates map constraints into the network via two carefully designed modules named coupled map and reference extractor. A novel multi-task optimization strategy (MTOS) is presented to enhance learning of topology and rule constraints. We also devise bilateral query scheme in context fusion for a more efficient and lightweight network. We evaluated our approach on Argoverse 1, Argoverse 2, and nuScenes real-world benchmarks, where it all achieved state-of-the-art performance with the lowest inference latency and smallest model size. Experiments also demonstrate that our framework is resilient to imperfect tracklet inputs. Furthermore, we show that by combining with our proposed strategies, classical models outperform their baselines, further validating the versatility of our framework.
This study introduces a novel framework, G3Reg, for fast and robust global registration of LiDAR point clouds. In contrast to conventional complex keypoints and descriptors, we extract fundamental geometric primitives including planes, clusters, and lines (PCL) from the raw point cloud to obtain low-level semantic segments. Each segment is formulated as a unified Gaussian Ellipsoid Model (GEM) by employing a probability ellipsoid to ensure the ground truth centers are encompassed with a certain degree of probability. Utilizing these GEMs, we then present a distrust-and-verify scheme based on a Pyramid Compatibility Graph for Global Registration (PAGOR). Specifically, we establish an upper bound, which can be traversed based on the confidence level for compatibility testing to construct the pyramid graph. Gradually, we solve multiple maximum cliques (MAC) for each level of the graph, generating numerous transformation candidates. In the verification phase, we adopt a precise and efficient metric for point cloud alignment quality, founded on geometric primitives, to identify the optimal candidate. The performance of the algorithm is extensively validated on three publicly available datasets and a self-collected multi-session dataset, without changing any parameter settings in the experimental evaluation. The results exhibit superior robustness and real-time performance of the G3Reg framework compared to state-of-the-art methods. Furthermore, we demonstrate the potential for integrating individual GEM and PAGOR components into other algorithmic frameworks to enhance their efficacy. To advance further research and promote community understanding, we have publicly shared the source code.
Global point cloud registration is essential in many robotics tasks like loop closing and relocalization. Unfortunately, the registration often suffers from the low overlap between point clouds, a frequent occurrence in practical applications due to occlusion and viewpoint change. In this paper, we propose a graph-theoretic framework to address the problem of global point cloud registration with low overlap. To this end, we construct a consistency graph to facilitate robust data association and employ graduated non-convexity (GNC) for reliable pose estimation, following the state-of-the-art (SoTA) methods. Unlike previous approaches, we use semantic cues to scale down the dense point clouds, thus reducing the problem size. Moreover, we address the ambiguity arising from the consistency threshold by constructing a pyramid graph with multi-level consistency thresholds. Then we propose a cascaded gradient ascend method to solve the resulting densest clique problem and obtain multiple pose candidates for every consistency threshold. Finally, fast geometric verification is employed to select the optimal estimation from multiple pose candidates. Our experiments, conducted on a self-collected indoor dataset and the public KITTI dataset, demonstrate that our method achieves the highest success rate despite the low overlap of point clouds and low semantic quality. We have open-sourced our code https://github.com/HKUST-Aerial-Robotics/Pagor for this project.
In this study, we introduce an online monocular lane mapping approach that solely relies on a single camera and odometry for generating spline-based maps. Our proposed technique models the lane association process as an assignment issue utilizing a bipartite graph, and assigns weights to the edges by incorporating Chamfer distance, pose uncertainty, and lateral sequence consistency. Furthermore, we meticulously design control point initialization, spline parameterization, and optimization to progressively create, expand, and refine splines. In contrast to prior research that assessed performance using self-constructed datasets, our experiments are conducted on the openly accessible OpenLane dataset. The experimental outcomes reveal that our suggested approach enhances lane association and odometry precision, as well as overall lane map quality. We have open-sourced our code1 for this project.
In this paper, we present a centralized framework for multi-session LiDAR mapping in urban environments, by utilizing lightweight line and plane map representations instead of widely used point clouds. The proposed framework achieves consistent mapping in a coarse-to-fine manner. Global place recognition is achieved by associating lines and planes on the Grassmannian manifold, followed by an outlier rejection-aided pose graph optimization for map merging. Then a novel bundle adjustment is also designed to improve the local consistency of lines and planes. In the experimental section, both public and self-collected datasets are used to demonstrate efficiency and effectiveness. Extensive results validate that our LiDAR mapping framework could merge multi-session maps globally, optimize maps incrementally, and is applicable for lightweight robot localization.
Predicting future trajectories of surrounding agents is essential for safety-critical autonomous driving. Most existing work focuses on predicting marginal trajectories for each agent independently. However, it has rarely been explored in predicting joint trajectories for interactive agents. In this work, we propose Bi-level Future Fusion (BiFF) to explicitly capture future interactions between interactive agents. Concretely, BiFF fuses the high-level future intentions followed by low-level future behaviors. Then the polyline-based coordinate is specifically designed for multi-agent prediction to ensure data efficiency, frame robustness, and prediction accuracy. Experiments show that BiFF achieves state-of-the-art performance on the interactive prediction benchmark of Waymo Open Motion Dataset.
Constructing a high-quality dense map in real-time is essential for robotics, AR/VR, and digital twins applications. As Neural Radiance Field (NeRF) greatly improves the mapping performance, in this paper, we propose a NeRF-based mapping method that enables higher-quality reconstruction and real-time capability even on edge computers. Specifically, we propose a novel hierarchical hybrid representation that leverages implicit multiresolution hash encoding aided by explicit octree SDF priors, describing the scene at different levels of detail. This representation allows for fast scene geometry initialization and makes scene geometry easier to learn. Besides, we present a coverage-maximizing keyframe selection strategy to address the forgetting issue and enhance mapping quality, particularly in marginal areas. To the best of our knowledge, our method is the first to achieve high-quality NeRF-based mapping on edge computers of handheld devices and quadrotors in real-time. Experiments demonstrate that our method outperforms existing NeRF-based mapping methods in geometry accuracy, texture realism, and time consumption. The code will be released at: https://github.com/SYSU-STAR/H2-Mapping