Micro-Aerial Vehicles (MAVs) have the advantage of moving freely in 3D space. However, creating compact and sparse map representations that can be efficiently used for planning for such robots is still an open problem. In this paper, we take maps built from noisy sensor data and construct a sparse graph containing topological information that can be used for 3D planning. We use a Euclidean Signed Distance Field, extract a 3D Generalized Voronoi Diagram (GVD), and obtain a thin skeleton diagram representing the topological structure of the environment. We then convert this skeleton diagram into a sparse graph, which we show is resistant to noise and changes in resolution. We demonstrate global planning over this graph, and the orders of magnitude speed-up it offers over other common planning methods. We validate our planning algorithm in real maps built onboard an MAV, using RGB-D sensing.
The growing popularity of autonomous systems creates a need for reliable and efficient metric pose retrieval algorithms. Currently used approaches tend to rely on nearest neighbor search of binary descriptors to perform the 2D-3D matching and guarantee realtime capabilities on mobile platforms. These methods struggle, however, with the growing size of the map, changes in viewpoint or appearance, and visual aliasing present in the environment. The rigidly defined descriptor patterns only capture a limited neighborhood of the keypoint and completely ignore the overall visual context. We propose LandmarkBoost - an approach that, in contrast to the conventional 2D-3D matching methods, casts the search problem as a landmark classification task. We use a boosted classifier to classify landmark observations and directly obtain correspondences as classifier scores. We also introduce a formulation of visual context that is flexible, efficient to compute, and can capture relationships in the entire image plane. The original binary descriptors are augmented with contextual information and informative features are selected by the boosting framework. Through detailed experiments, we evaluate the retrieval quality and performance of LandmarkBoost, demonstrating that it outperforms common state-of-the-art descriptor matching methods.
Rapid deployment and operation are key requirements in time critical application, such as Search and Rescue (SaR). Efficiently teleoperated ground robots can support first-responders in such situations. However, first-person view teleoperation is sub-optimal in difficult terrains, while a third-person perspective can drastically increase teleoperation performance. Here, we propose a Micro Aerial Vehicle (MAV)-based system that can autonomously provide third-person perspective to ground robots. While our approach is based on local visual servoing, it further leverages the global localization of several ground robots to seamlessly transfer between these ground robots in GPS-denied environments. Therewith one MAV can support multiple ground robots on a demand basis. Furthermore, our system enables different visual detection regimes, and enhanced operability, and return-home functionality. We evaluate our system in real-world SaR scenarios.
Visual localization and mapping is a crucial capability to address many challenges in mobile robotics. It constitutes a robust, accurate and cost-effective approach for local and global pose estimation within prior maps. Yet, in highly dynamic environments, like crowded city streets, problems arise as major parts of the image can be covered by dynamic objects. Consequently, visual odometry pipelines often diverge and the localization systems malfunction as detected features are not consistent with the precomputed 3D model. In this work, we present an approach to automatically detect dynamic object instances to improve the robustness of vision-based localization and mapping in crowded environments. By training a convolutional neural network model with a combination of synthetic and real-world data, dynamic object instance masks are learned in a semi-supervised way. The real-world data can be collected with a standard camera and requires minimal further post-processing. Our experiments show that a wide range of dynamic objects can be reliably detected using the presented method. Promising performance is demonstrated on our own and also publicly available datasets, which also shows the generalization capabilities of this approach.
Horticultural enterprises are becoming more sophisticated as the range of the crops they target expands. Requirements for enhanced efficiency and productivity have driven the demand for automating on-field operations. However, various problems remain yet to be solved for their reliable, safe deployment in real-world scenarios. This paper examines major research trends and current challenges in horticultural robotics. Specifically, our work focuses on sensing and perception in the three main horticultural procedures: pollination, yield estimation, and harvesting. For each task, we expose major issues arising from the unstructured, cluttered, and rugged nature of field environments, including variable lighting conditions and difficulties in fruit-specific detection, and highlight promising contemporary studies.
We propose a novel scoring concept for visual place recognition based on nearest neighbor descriptor voting and demonstrate how the algorithm naturally emerges from the problem formulation. Based on the observation that the number of votes for matching places can be evaluated using a binomial distribution model, loop closures can be detected with high precision. By casting the problem into a probabilistic framework, we not only remove the need for commonly employed heuristic parameters but also provide a powerful score to classify matching and non-matching places. We present methods for both a 2D-2D pose-graph vertex matching and a 2D-3D landmark matching based on the above scoring. The approach maintains accuracy while being efficient enough for online application through the use of compact (low dimensional) descriptors and fast nearest neighbor retrieval techniques. The proposed methods are evaluated on several challenging datasets in varied environments, showing state-of-the-art results with high precision and high recall.
This paper discusses a large-scale and long-term mapping and localization scenario using the maplab open-source framework. We present a brief overview of the specific algorithms in the system that enable building a consistent map from multiple sessions. We then demonstrate that such a map can be reused even a few months later for efficient 6-DoF localization and also new trajectories can be registered within the existing 3D model. The datasets presented in this paper are made publicly available.
As off-the-shelf (OTS) autopilots become more widely available and user-friendly and the drone market expands, safer, more efficient, and more complex motion planning and control will become necessary for fixed-wing aerial robotic platforms. Considering typical low-level attitude stabilization available on OTS flight controllers, this paper first develops an approach for modeling and identification of the control augmented dynamics for a small fixed-wing Unmanned Aerial Vehicle (UAV). A high-level Nonlinear Model Predictive Controller (NMPC) is subsequently formulated for simultaneous airspeed stabilization, path following, and soft constraint handling, using the identified model for horizon propagation. The approach is explored in several exemplary flight experiments including path following of helix and connected Dubins Aircraft segments in high winds as well as a motor failure scenario. The cost function, insights on its weighting, and additional soft constraints used throughout the experimentation are discussed.
When performing localization and mapping, working at the level of structure can be advantageous in terms of robustness to environmental changes and differences in illumination. This paper presents SegMap: a map representation solution to the localization and mapping problem based on the extraction of segments in 3D point clouds. In addition to facilitating the computationally intensive task of processing 3D point clouds, working at the level of segments addresses the data compression requirements of real-time single- and multi-robot systems. While current methods extract descriptors for the single task of localization, SegMap leverages a data-driven descriptor in order to extract meaningful features that can also be used for reconstructing a dense 3D map of the environment and for extracting semantic information. This is particularly interesting for navigation tasks and for providing visual feedback to end-users such as robot operators, for example in search and rescue scenarios. These capabilities are demonstrated in multiple urban driving and search and rescue experiments. Our method leads to an increase of area under the ROC curve of 28.3% over current state of the art using eigenvalue based features. We also obtain very similar reconstruction capabilities to a model specifically trained for this task. The SegMap implementation will be made available open-source along with easy to run demonstrations at www.github.com/ethz-asl/segmap. A video demonstration is available at https://youtu.be/CMk4w4eRobg.
In the absence of global positioning information, place recognition is a key capability for enabling localization, mapping and navigation in any environment. Most place recognition methods rely on images, point clouds, or a combination of both. In this work we leverage a segment extraction and matching approach to achieve place recognition in Light Detection and Ranging (LiDAR) based 3D point cloud maps. One challenge related to this approach is the recognition of segments despite changes in point of view or occlusion. We propose using a learning based method in order to reach a higher recall accuracy then previously proposed methods. Using Convolutional Neural Networks (CNNs), which are state-of-the-art classifiers, we propose a new approach to segment recognition based on learned descriptors. In this paper we compare the effectiveness of three different structures and training methods for CNNs. We demonstrate through several experiments on real-world data collected in an urban driving scenario that the proposed learning based methods outperform hand-crafted descriptors.