Purdue University
Abstract:3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based representations and volumetric rendering techniques, enabling real-time, high-quality rendering. However, 3DGS models typically overfit to single-scene training and are highly sensitive to the initialization of Gaussian ellipsoids, heuristically derived from Structure from Motion (SfM) point clouds, which limits both generalization and practicality. To address these limitations, we propose GS-Net, a generalizable, plug-and-play 3DGS module that densifies Gaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure representation. To the best of our knowledge, GS-Net is the first plug-and-play 3DGS module with cross-scene generalization capabilities. Additionally, we introduce the CARLA-NVS dataset, which incorporates additional camera viewpoints to thoroughly evaluate reconstruction and rendering quality. Extensive experiments demonstrate that applying GS-Net to 3DGS yields a PSNR improvement of 2.08 dB for conventional viewpoints and 1.86 dB for novel viewpoints, confirming the method's effectiveness and robustness.
Abstract:As quadrotors take on an increasingly diverse range of roles, researchers often need to develop new hardware platforms tailored for specific tasks, introducing significant engineering overhead. In this article, we introduce the UniQuad series, a unified and versatile quadrotor platform series that offers high flexibility to adapt to a wide range of common tasks, excellent customizability for advanced demands, and easy maintenance in case of crashes. This project is fully open-source at https://hkust-aerial-robotics.github.io/UniQuad.
Abstract:This paper introduces FALCON, a novel Fast Autonomous expLoration framework using COverage path guidaNce, which aims at setting a new performance benchmark in the field of autonomous aerial exploration. Despite recent advancements in the domain, existing exploration planners often suffer from inefficiencies such as frequent revisitations of previously explored regions. FALCON effectively harnesses the full potential of online generated coverage paths in enhancing exploration efficiency. The framework begins with an incremental connectivity-aware space decomposition and connectivity graph construction, which facilitate efficient coverage path planning. Subsequently, a hierarchical planner generates a coverage path spanning the entire unexplored space, serving as a global guidance. Then, a local planner optimizes the frontier visitation order, minimizing traversal time while consciously incorporating the intention of the global guidance. Finally, minimum-time smooth and safe trajectories are produced to visit the frontier viewpoints. For fair and comprehensive benchmark experiments, we introduce a lightweight exploration planner evaluation environment that allows for comparing exploration planners across a variety of testing scenarios using an identical quadrotor simulator. Additionally, a VECO criteria is proposed for an in-depth analysis of FALCON's significant performance in comparison with the state-of-the-art exploration planners. Extensive ablation studies demonstrate the effectiveness of each component in the proposed framework. Real-world experiments conducted fully onboard further validate FALCON's practical capability in complex and challenging environments. The source code of both the exploration planner FALCON and the exploration planner evaluation environment will be released to benefit the community.
Abstract:Satellite-based street-view information extraction by cross-view matching refers to a task that extracts the location and orientation information of a given street-view image query by using one or multiple geo-referenced satellite images. Recent work has initiated a new research direction to find accurate information within a local area covered by one satellite image centered at a location prior (e.g., from GPS). It can be used as a standalone solution or complementary step following a large-scale search with multiple satellite candidates. However, these existing works require an accurate initial orientation (angle) prior (e.g., from IMU) and/or do not efficiently search through all possible poses. To allow efficient search and to give accurate prediction regardless of the existence or the accuracy of the angle prior, we present PetalView extractors with multi-scale search. The PetalView extractors give semantically meaningful features that are equivalent across two drastically different views, and the multi-scale search strategy efficiently inspects the satellite image from coarse to fine granularity to provide sub-meter and sub-degree precision extraction. Moreover, when an angle prior is given, we propose a learnable prior angle mixer to utilize this information. Our method obtains the best performance on the VIGOR dataset and successfully improves the performance on KITTI dataset test 1 set with the recall within 1 meter (r@1m) for location estimation to 68.88% and recall within 1 degree (r@1d) 21.10% when no angle prior is available, and with angle prior achieves stable estimations at r@1m and r@1d above 70% and 21%, up to a 40-degree noise level.
Abstract:Various perception-aware planning approaches have attempted to enhance the state estimation accuracy during maneuvers, while the feature matchability among frames, a crucial factor influencing estimation accuracy, has often been overlooked. In this paper, we present APACE, an Agile and Perception-Aware trajeCtory gEneration framework for quadrotors aggressive flight, that takes into account feature matchability during trajectory planning. We seek to generate a perception-aware trajectory that reduces the error of visual-based estimator while satisfying the constraints on smoothness, safety, agility and the quadrotor dynamics. The perception objective is achieved by maximizing the number of covisible features while ensuring small enough parallax angles. Additionally, we propose a differentiable and accurate visibility model that allows decomposition of the trajectory planning problem for efficient optimization resolution. Through validations conducted in both a photorealistic simulator and real-world experiments, we demonstrate that the trajectories generated by our method significantly improve state estimation accuracy, with root mean square error (RMSE) reduced by up to an order of magnitude. The source code will be released to benefit the community.
Abstract:Recent technological advances have led to contemporary applications that demand real-time processing and analysis of sequentially arriving tensor data. Traditional offline learning, involving the storage and utilization of all data in each computational iteration, becomes impractical for high-dimensional tensor data due to its voluminous size. Furthermore, existing low-rank tensor methods lack the capability for statistical inference in an online fashion, which is essential for real-time predictions and informed decision-making. This paper addresses these challenges by introducing a novel online inference framework for low-rank tensor learning. Our approach employs Stochastic Gradient Descent (SGD) to enable efficient real-time data processing without extensive memory requirements, thereby significantly reducing computational demands. We establish a non-asymptotic convergence result for the online low-rank SGD estimator, nearly matches the minimax optimal rate of estimation error in offline models that store all historical data. Building upon this foundation, we propose a simple yet powerful online debiasing approach for sequential statistical inference in low-rank tensor learning. The entire online procedure, covering both estimation and inference, eliminates the need for data splitting or storing historical data, making it suitable for on-the-fly hypothesis testing. Given the sequential nature of our data collection, traditional analyses relying on offline methods and sample splitting are inadequate. In our analysis, we control the sum of constructed super-martingales to ensure estimates along the entire solution path remain within the benign region. Additionally, a novel spectral representation tool is employed to address statistical dependencies among iterative estimates, establishing the desired asymptotic normality.
Abstract:Recently, reinforcement learning has gained prominence in modern statistics, with policy evaluation being a key component. Unlike traditional machine learning literature on this topic, our work places emphasis on statistical inference for the parameter estimates computed using reinforcement learning algorithms. While most existing analyses assume random rewards to follow standard distributions, limiting their applicability, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework. In this paper, we develop an online robust policy evaluation procedure, and establish the limiting distribution of our estimator, based on its Bahadur representation. Furthermore, we develop a fully-online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation. Finally, we validate the efficacy of our algorithm through numerical experiments conducted in real-world reinforcement learning experiments.
Abstract:Early detection of dysplasia of the cervix is critical for cervical cancer treatment. However, automatic cervical dysplasia diagnosis via visual inspection, which is more appropriate in low-resource settings, remains a challenging problem. Though promising results have been obtained by recent deep learning models, their performance is significantly hindered by the limited scale of the available cervix datasets. Distinct from previous methods that learn from a single dataset, we propose to leverage cross-domain cervical images that were collected in different but related clinical studies to improve the model's performance on the targeted cervix dataset. To robustly learn the transferable information across datasets, we propose a novel prototype-based knowledge filtering method to estimate the transferability of cross-domain samples. We further optimize the shared feature space by aligning the cross-domain image representations simultaneously on domain level with early alignment and class level with supervised contrastive learning, which endows model training and knowledge transfer with stronger robustness. The empirical results on three real-world benchmark cervical image datasets show that our proposed method outperforms the state-of-the-art cervical dysplasia visual inspection by an absolute improvement of 4.7% in top-1 accuracy, 7.0% in precision, 1.4% in recall, 4.6% in F1 score, and 0.05 in ROC-AUC.
Abstract:Street-view imagery provides us with novel experiences to explore different places remotely. Carefully calibrated street-view images (e.g. Google Street View) can be used for different downstream tasks, e.g. navigation, map features extraction. As personal high-quality cameras have become much more affordable and portable, an enormous amount of crowdsourced street-view images are uploaded to the internet, but commonly with missing or noisy sensor information. To prepare this hidden treasure for "ready-to-use" status, determining missing location information and camera orientation angles are two equally important tasks. Recent methods have achieved high performance on geo-localization of street-view images by cross-view matching with a pool of geo-referenced satellite imagery. However, most of the existing works focus more on geo-localization than estimating the image orientation. In this work, we re-state the importance of finding fine-grained orientation for street-view images, formally define the problem and provide a set of evaluation metrics to assess the quality of the orientation estimation. We propose two methods to improve the granularity of the orientation estimation, achieving 82.4% and 72.3% accuracy for images with estimated angle errors below 2 degrees for CVUSA and CVACT datasets, corresponding to 34.9% and 28.2% absolute improvement compared to previous works. Integrating fine-grained orientation estimation in training also improves the performance on geo-localization, giving top 1 recall 95.5%/85.5% and 86.8%/80.4% for orientation known/unknown tests on the two datasets.
Abstract:Stochastic gradient descent with momentum (SGDM) has been widely used in many machine learning and statistical applications. Despite the observed empirical benefits of SGDM over traditional SGD, the theoretical understanding of the role of momentum for different learning rates in the optimization process remains widely open. We analyze the finite-sample convergence rate of SGDM under the strongly convex settings and show that, with a large batch size, the mini-batch SGDM converges faster than mini-batch SGD to a neighborhood of the optimal value. Furthermore, we analyze the Polyak-averaging version of the SGDM estimator, establish its asymptotic normality, and justify its asymptotic equivalence to the averaged SGD.