Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sebastian Scherer

AirLoc: Object-based Indoor Relocalization

Apr 03, 2023

Aryan, Bowen Li, Sebastian Scherer, Yun-Jou Lin, Chen Wang

Figure 1 for AirLoc: Object-based Indoor Relocalization

Figure 2 for AirLoc: Object-based Indoor Relocalization

Figure 3 for AirLoc: Object-based Indoor Relocalization

Figure 4 for AirLoc: Object-based Indoor Relocalization

Abstract:Indoor relocalization is vital for both robotic tasks like autonomous exploration and civil applications such as navigation with a cell phone in a shopping mall. Some previous approaches adopt geometrical information such as key-point features or local textures to carry out indoor relocalization, but they either easily fail in an environment with visually similar scenes or require many database images. Inspired by the fact that humans often remember places by recognizing unique landmarks, we resort to objects, which are more informative than geometry elements. In this work, we propose a simple yet effective object-based indoor relocalization approach, dubbed AirLoc. To overcome the critical challenges of object reidentification and remembering object relationships, we extract object-wise appearance embedding and inter-object geometric relationships. The geometry and appearance features are integrated to generate cumulative scene features. This results in a robust, accurate, and portable indoor relocalization system, which outperforms the state-of-the-art methods in room-level relocalization by 9.5% of PR-AUC and 7% of accuracy. In addition to exhaustive evaluation, we also carry out real-world tests, where AirLoc shows robustness in challenges like severe occlusion, perceptual aliasing, viewpoint shift, and deformation.

Via

Access Paper or Ask Questions

Learning Risk-Aware Costmaps via Inverse Reinforcement Learning for Off-Road Navigation

Jan 31, 2023

Samuel Triest, Mateo Guaman Castro, Parv Maheshwari, Matthew Sivaprakasam, Wenshan Wang, Sebastian Scherer

Abstract:The process of designing costmaps for off-road driving tasks is often a challenging and engineering-intensive task. Recent work in costmap design for off-road driving focuses on training deep neural networks to predict costmaps from sensory observations using corpora of expert driving data. However, such approaches are generally subject to over-confident mispredictions and are rarely evaluated in-the-loop on physical hardware. We present an inverse reinforcement learning-based method of efficiently training deep cost functions that are uncertainty-aware. We do so by leveraging recent advances in highly parallel model-predictive control and robotic risk estimation. In addition to demonstrating improvement at reproducing expert trajectories, we also evaluate the efficacy of these methods in challenging off-road navigation scenarios. We observe that our method significantly outperforms a geometric baseline, resulting in 44% improvement in expert path reconstruction and 57% fewer interventions in practice. We also observe that varying the risk tolerance of the vehicle results in qualitatively different navigation behaviors, especially with respect to higher-risk scenarios such as slopes and tall grass.

Via

Access Paper or Ask Questions

UAS Simulator for Modeling, Analysis and Control in Free Flight and Physical Interaction

Dec 06, 2022

Azarakhsh Keipour, Mohammadreza Mousaei, Dongwei Bai, Junyi Geng, Sebastian Scherer

Figure 1 for UAS Simulator for Modeling, Analysis and Control in Free Flight and Physical Interaction

Figure 2 for UAS Simulator for Modeling, Analysis and Control in Free Flight and Physical Interaction

Figure 3 for UAS Simulator for Modeling, Analysis and Control in Free Flight and Physical Interaction

Figure 4 for UAS Simulator for Modeling, Analysis and Control in Free Flight and Physical Interaction

Abstract:This paper presents the ARCAD simulator for the rapid development of Unmanned Aerial Systems (UAS), including underactuated and fully-actuated multirotors, fixed-wing aircraft, and Vertical Take-Off and Landing (VTOL) hybrid vehicles. The simulator is designed to accelerate these aircraft's modeling and control design. It provides various analyses of the design and operation, such as wrench-set computation, controller response, and flight optimization. In addition to simulating free flight, it can simulate the physical interaction of the aircraft with its environment. The simulator is written in MATLAB to allow rapid prototyping and is capable of generating graphical visualization of the aircraft and the environment in addition to generating the desired plots. It has been used to develop several real-world multirotor and VTOL applications. The source code is available at https://github.com/keipour/aircraft-simulator-matlab.

* Accepted to 2023 AIAA SciTech Forum. American Institute of Aeronautics and Astronautics

Via

Access Paper or Ask Questions

Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

Dec 06, 2022

Zhipeng Zhao, Huai Yu, Chenwei Lyv, Wen Yang, Sebastian Scherer

Figure 1 for Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

Figure 2 for Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

Figure 3 for Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

Figure 4 for Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

Abstract:Visual localization plays an important role for intelligent robots and autonomous driving, especially when the accuracy of GNSS is unreliable. Recently, camera localization in LiDAR maps has attracted more and more attention for its low cost and potential robustness to illumination and weather changes. However, the commonly used pinhole camera has a narrow Field-of-View, thus leading to limited information compared with the omni-directional LiDAR data. To overcome this limitation, we focus on correlating the information of 360 equirectangular images to point clouds, proposing an end-to-end learnable network to conduct cross-modal visual localization by establishing similarity in high-dimensional feature space. Inspired by the attention mechanism, we optimize the network to capture the salient feature for comparing images and point clouds. We construct several sequences containing 360 equirectangular images and corresponding point clouds based on the KITTI-360 dataset and conduct extensive experiments. The results demonstrate the effectiveness of our approach.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework

Nov 21, 2022

Bowen Li, Ziyuan Huang, Junjie Ye, Yiming Li, Sebastian Scherer, Hang Zhao, Changhong Fu

Abstract:Visual object tracking is an essential capability of intelligent robots. Most existing approaches have ignored the online latency that can cause severe performance degradation during real-world processing. Especially for unmanned aerial vehicle, where robust tracking is more challenging and onboard computation is limited, latency issue could be fatal. In this work, we present a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). PVT++ is capable of turning most leading-edge trackers into predictive trackers by appending an online predictor. Unlike existing solutions that use model-based approaches, our framework is learnable, such that it can take not only motion information as input but it can also take advantage of visual cues or a combination of both. Moreover, since PVT++ is end-to-end optimizable, it can further boost the latency-aware tracking performance by joint training. Additionally, this work presents an extended latency-aware evaluation benchmark for assessing an any-speed tracker in the online setting. Empirical results on robotic platform from aerial perspective show that PVT++ can achieve up to 60% performance gain on various trackers and exhibit better robustness than prior model-based solution, largely mitigating the degradation brought by latency. Code and models will be made public.

* 18 pages, 10 figures

Via

Access Paper or Ask Questions

Challenges in Close-Proximity Safe and Seamless Operation of Manned and Unmanned Aircraft in Shared Airspace

Nov 13, 2022

Jay Patrikar, Joao P. A. Dantas, Sourish Ghosh, Parv Kapoor, Ian Higgins, Jasmine J. Aloor, Ingrid Navarro, Jimin Sun, Ben Stoler, Milad Hamidi(+4 more)

Abstract:We propose developing an integrated system to keep autonomous unmanned aircraft safely separated and behave as expected in conjunction with manned traffic. The main goal is to achieve safe manned-unmanned vehicle teaming to improve system performance, have each (robot/human) teammate learn from each other in various aircraft operations, and reduce the manning needs of manned aircraft. The proposed system anticipates and reacts to other aircraft using natural language instructions and can serve as a co-pilot or operate entirely autonomously. We point out the main technical challenges where improvements on current state-of-the-art are needed to enable Visual Flight Rules to fully autonomous aerial operations, bringing insights to these critical areas. Furthermore, we present an interactive demonstration in a prototypical scenario with one AI pilot and one human pilot sharing the same terminal airspace, interacting with each other using language, and landing safely on the same runway. We also show a demonstration of a vision-only aircraft detection system.

Via

Access Paper or Ask Questions

Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic Segmentation

Oct 19, 2022

Sebastian Scherer, Robin Schön, Rainer Lienhart

Figure 1 for Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic Segmentation

Figure 2 for Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic Segmentation

Figure 3 for Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic Segmentation

Figure 4 for Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic Segmentation

Abstract:Semi-supervised learning (SSL) can reduce the need for large labelled datasets by incorporating unlabelled data into the training. This is particularly interesting for semantic segmentation, where labelling data is very costly and time-consuming. Current SSL approaches use an initially supervised trained model to generate predictions for unlabelled images, called pseudo-labels, which are subsequently used for training a new model from scratch. Since the predictions usually do not come from an error-free neural network, they are naturally full of errors. However, training with partially incorrect labels often reduce the final model performance. Thus, it is crucial to manage errors/noise of pseudo-labels wisely. In this work, we use three mechanisms to control pseudo-label noise and errors: (1) We construct a solid base framework by mixing images with cow-patterns on unlabelled images to reduce the negative impact of wrong pseudo-labels. Nevertheless, wrong pseudo-labels still have a negative impact on the performance. Therefore, (2) we propose a simple and effective loss weighting scheme for pseudo-labels defined by the feedback of the model trained on these pseudo-labels. This allows us to soft-weight the pseudo-label training examples based on their determined confidence score during training. (3) We also study the common practice to ignore pseudo-labels with low confidence and empirically analyse the influence and effect of pseudo-labels with different confidence ranges on SSL and the contribution of pseudo-label filtering to the achievable performance gains. We show that our method performs superior to state of-the-art alternatives on various datasets. Furthermore, we show that our findings also transfer to other tasks such as human pose estimation. Our code is available at https://github.com/ChristmasFan/SSL_Denoising_Segmentation.

* Accepted to BMVC 2022

Via

Access Paper or Ask Questions

TartanCalib: Iterative Wide-Angle Lens Calibration using Adaptive SubPixel Refinement of AprilTags

Oct 05, 2022

Bardienus P Duisterhof, Yaoyu Hu, Si Heng Teng, Michael Kaess, Sebastian Scherer

Figure 1 for TartanCalib: Iterative Wide-Angle Lens Calibration using Adaptive SubPixel Refinement of AprilTags

Figure 2 for TartanCalib: Iterative Wide-Angle Lens Calibration using Adaptive SubPixel Refinement of AprilTags

Figure 3 for TartanCalib: Iterative Wide-Angle Lens Calibration using Adaptive SubPixel Refinement of AprilTags

Figure 4 for TartanCalib: Iterative Wide-Angle Lens Calibration using Adaptive SubPixel Refinement of AprilTags

Abstract:Wide-angle cameras are uniquely positioned for mobile robots, by virtue of the rich information they provide in a small, light, and cost-effective form factor. An accurate calibration of the intrinsics and extrinsics is a critical pre-requisite for using the edge of a wide-angle lens for depth perception and odometry. Calibrating wide-angle lenses with current state-of-the-art techniques yields poor results due to extreme distortion at the edge, as most algorithms assume a lens with low to medium distortion closer to a pinhole projection. In this work we present our methodology for accurate wide-angle calibration. Our pipeline generates an intermediate model, and leverages it to iteratively improve feature detection and eventually the camera parameters. We test three key methods to utilize intermediate camera models: (1) undistorting the image into virtual pinhole cameras, (2) reprojecting the target into the image frame, and (3) adaptive subpixel refinement. Combining adaptive subpixel refinement and feature reprojection significantly improves reprojection errors by up to 26.59 %, helps us detect up to 42.01 % more features, and improves performance in the downstream task of dense depth mapping. Finally, TartanCalib is open-source and implemented into an easy-to-use calibration toolbox. We also provide a translation layer with other state-of-the-art works, which allows for regressing generic models with thousands of parameters or using a more robust solver. To this end, TartanCalib is the tool of choice for wide-angle calibration. Project website and code: http://tartancalib.com.

Via

Access Paper or Ask Questions

360FusionNeRF: Panoramic Neural Radiance Fields with Joint Guidance

Oct 03, 2022

Shreyas Kulkarni, Peng Yin, Sebastian Scherer

Figure 1 for 360FusionNeRF: Panoramic Neural Radiance Fields with Joint Guidance

Figure 2 for 360FusionNeRF: Panoramic Neural Radiance Fields with Joint Guidance

Figure 3 for 360FusionNeRF: Panoramic Neural Radiance Fields with Joint Guidance

Figure 4 for 360FusionNeRF: Panoramic Neural Radiance Fields with Joint Guidance

Abstract:We present a method to synthesize novel views from a single $360^\circ$ panorama image based on the neural radiance field (NeRF). Prior studies in a similar setting rely on the neighborhood interpolation capability of multi-layer perceptions to complete missing regions caused by occlusion, which leads to artifacts in their predictions. We propose 360FusionNeRF, a semi-supervised learning framework where we introduce geometric supervision and semantic consistency to guide the progressive training process. Firstly, the input image is re-projected to $360^\circ$ images, and auxiliary depth maps are extracted at other camera positions. The depth supervision, in addition to the NeRF color guidance, improves the geometry of the synthesized views. Additionally, we introduce a semantic consistency loss that encourages realistic renderings of novel views. We extract these semantic features using a pre-trained visual encoder such as CLIP, a Vision Transformer trained on hundreds of millions of diverse 2D photographs mined from the web with natural language supervision. Experiments indicate that our proposed method can produce plausible completions of unobserved regions while preserving the features of the scene. When trained across various scenes, 360FusionNeRF consistently achieves the state-of-the-art performance when transferring to synthetic Structured3D dataset (PSNR~5%, SSIM~3% LPIPS~13%), real-world Matterport3D dataset (PSNR~3%, SSIM~3% LPIPS~9%) and Replica360 dataset (PSNR~8%, SSIM~2% LPIPS~18%).

* 8 pages, Fig 3, Submitted to IEEE RAL. arXiv admin note: text overlap with arXiv:2106.10859, arXiv:2104.00677, arXiv:2203.09957, arXiv:2204.00928 by other authors

Via

Access Paper or Ask Questions

PyPose: A Library for Robot Learning with Physics-based Optimization

Sep 30, 2022

Chen Wang, Dasong Gao, Kuan Xu, Junyi Geng, Yaoyu Hu, Yuheng Qiu, Bowen Li, Fan Yang, Brady Moon, Abhinav Pandey(+23 more)

Figure 1 for PyPose: A Library for Robot Learning with Physics-based Optimization

Figure 2 for PyPose: A Library for Robot Learning with Physics-based Optimization

Figure 3 for PyPose: A Library for Robot Learning with Physics-based Optimization

Figure 4 for PyPose: A Library for Robot Learning with Physics-based Optimization

Abstract:Deep learning has had remarkable success in robotic perception, but its data-centric nature suffers when it comes to generalizing to ever-changing environments. By contrast, physics-based optimization generalizes better, but it does not perform as well in complicated tasks due to the lack of high-level semantic information and the reliance on manual parametric tuning. To take advantage of these two complementary worlds, we present PyPose: a robotics-oriented, PyTorch-based library that combines deep perceptual models with physics-based optimization techniques. Our design goal for PyPose is to make it user-friendly, efficient, and interpretable with a tidy and well-organized architecture. Using an imperative style interface, it can be easily integrated into real-world robotic applications. Besides, it supports parallel computing of any order gradients of Lie groups and Lie algebras and $2^{\text{nd}}$-order optimizers, such as trust region methods. Experiments show that PyPose achieves 3-20$\times$ speedup in computation compared to state-of-the-art libraries. To boost future research, we provide concrete examples across several fields of robotics, including SLAM, inertial navigation, planning, and control.

Via

Access Paper or Ask Questions