We present Point-TTA, a novel test-time adaptation framework for point cloud registration (PCR) that improves the generalization and the performance of registration models. While learning-based approaches have achieved impressive progress, generalization to unknown testing environments remains a major challenge due to the variations in 3D scans. Existing methods typically train a generic model and the same trained model is applied on each instance during testing. This could be sub-optimal since it is difficult for the same model to handle all the variations during testing. In this paper, we propose a test-time adaptation approach for PCR. Our model can adapt to unseen distributions at test-time without requiring any prior knowledge of the test data. Concretely, we design three self-supervised auxiliary tasks that are optimized jointly with the primary PCR task. Given a test instance, we adapt our model using these auxiliary tasks and the updated model is used to perform the inference. During training, our model is trained using a meta-auxiliary learning approach, such that the adapted model via auxiliary tasks improves the accuracy of the primary task. Experimental results demonstrate the effectiveness of our approach in improving generalization of point cloud registration and outperforming other state-of-the-art approaches.
With the increasing size of pre-trained language models (PLMs), fine-tuning all the parameters in the model is not efficient, especially when there are a large number of downstream tasks, which incur significant training and storage costs. Many parameter-efficient fine-tuning (PEFT) approaches have been proposed, among which, Low-Rank Adaptation (LoRA) is a representative approach that injects trainable rank decomposition matrices into every target module. Yet LoRA ignores the importance of parameters in different modules. To address this problem, many works have been proposed to prune the parameters of LoRA. However, under limited training conditions, the upper bound of the rank of the pruned parameter matrix is still affected by the preset values. We, therefore, propose IncreLoRA, an incremental parameter allocation method that adaptively adds trainable parameters during training based on the importance scores of each module. This approach is different from the pruning method as it is not limited by the initial number of training parameters, and each parameter matrix has a higher rank upper bound for the same training overhead. We conduct extensive experiments on GLUE to demonstrate the effectiveness of IncreLoRA. The results show that our method owns higher parameter efficiency, especially when under the low-resource settings where our method significantly outperforms the baselines. Our code is publicly available.
The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures. We present a learning framework to recover the shape interiors (RoSI) of existing 3D models with only their exteriors from multi-view and multi-articulation images. Given a set of RGB images that capture a target 3D object in different articulated poses, possibly from only few views, our method infers the interior planes that are observable in the input images. Our neural architecture is trained in a category-agnostic manner and it consists of a motion-aware multi-view analysis phase including pose, depth, and motion estimations, followed by interior plane detection in images and 3D space, and finally multi-view plane fusion. In addition, our method also predicts part articulations and is able to realize and even extrapolate the captured motions on the target 3D object. We evaluate our method by quantitative and qualitative comparisons to baselines and alternative solutions, as well as testing on untrained object categories and real image inputs to assess its generalization capabilities.
Failure to recognize samples from the classes unseen during training is a major limit of artificial intelligence (AI) in real-world implementation of retinal anomaly classification. To resolve this obstacle, we propose an uncertainty-inspired open-set (UIOS) model which was trained with fundus images of 9 common retinal conditions. Besides the probability of each category, UIOS also calculates an uncertainty score to express its confidence. Our UIOS model with thresholding strategy achieved an F1 score of 99.55%, 97.01% and 91.91% for the internal testing set, external testing set and non-typical testing set, respectively, compared to the F1 score of 92.20%, 80.69% and 64.74% by the standard AI model. Furthermore, UIOS correctly predicted high uncertainty scores, which prompted the need for a manual check, in the datasets of rare retinal diseases, low-quality fundus images, and non-fundus images. This work provides a robust method for real-world screening of retinal anomalies.
This paper tackles spectral reflectance recovery (SRR) from RGB images. Since capturing ground-truth spectral reflectance and camera spectral sensitivity are challenging and costly, most existing approaches are trained on synthetic images and utilize the same parameters for all unseen testing images, which are suboptimal especially when the trained models are tested on real images because they never exploit the internal information of the testing images. To address this issue, we adopt a self-supervised meta-auxiliary learning (MAXL) strategy that fine-tunes the well-trained network parameters with each testing image to combine external with internal information. To the best of our knowledge, this is the first work that successfully adapts the MAXL strategy to this problem. Instead of relying on naive end-to-end training, we also propose a novel architecture that integrates the physical relationship between the spectral reflectance and the corresponding RGB images into the network based on our mathematical analysis. Besides, since the spectral reflectance of a scene is independent to its illumination while the corresponding RGB images are not, we recover the spectral reflectance of a scene from its RGB images captured under multiple illuminations to further reduce the unknown. Qualitative and quantitative evaluations demonstrate the effectiveness of our proposed network and of the MAXL. Our code and data are available at https://github.com/Dong-Huo/SRR-MAXL.
Deep learning models have shown promising performance in the field of diabetic retinopathy (DR) staging. However, collaboratively training a DR staging model across multiple institutions remains a challenge due to non-iid data, client reliability, and confidence evaluation of the prediction. To address these issues, we propose a novel federated uncertainty-aware aggregation paradigm (FedUAA), which considers the reliability of each client and produces a confidence estimation for the DR staging. In our FedUAA, an aggregated encoder is shared by all clients for learning a global representation of fundus images, while a novel temperature-warmed uncertainty head (TWEU) is utilized for each client for local personalized staging criteria. Our TWEU employs an evidential deep layer to produce the uncertainty score with the DR staging results for client reliability evaluation. Furthermore, we developed a novel uncertainty-aware weighting module (UAW) to dynamically adjust the weights of model aggregation based on the uncertainty score distribution of each client. In our experiments, we collect five publicly available datasets from different institutions to conduct a dataset for federated DR staging to satisfy the real non-iid condition. The experimental results demonstrate that our FedUAA achieves better DR staging performance with higher reliability compared to other federated learning methods. Our proposed FedUAA paradigm effectively addresses the challenges of collaboratively training DR staging models across multiple institutions, and provides a robust and reliable solution for the deployment of DR diagnosis models in real-world clinical scenarios.
Vision Transformers (ViTs) are becoming a very popular paradigm for vision tasks as they achieve state-of-the-art performance on image classification. However, although early works implied that this network structure had increased robustness against adversarial attacks, some works argue ViTs are still vulnerable. This paper presents our first attempt toward detecting adversarial attacks during inference time using the network's input and outputs as well as latent features. We design four quantifications (or derivatives) of input, output, and latent vectors of ViT-based models that provide a signature of the inference, which could be beneficial for the attack detection, and empirically study their behavior over clean samples and adversarial samples. The results demonstrate that the quantifications from input (images) and output (posterior probabilities) are promising for distinguishing clean and adversarial samples, while latent vectors offer less discriminative power, though they give some insights on how adversarial perturbations work.
Federated learning (FL), as an effective decentralized distributed learning approach, enables multiple institutions to jointly train a model without sharing their local data. However, the domain feature shift caused by different acquisition devices/clients substantially degrades the performance of the FL model. Furthermore, most existing FL approaches aim to improve accuracy without considering reliability (e.g., confidence or uncertainty). The predictions are thus unreliable when deployed in safety-critical applications. Therefore, aiming at improving the performance of FL in non-Domain feature issues while enabling the model more reliable. In this paper, we propose a novel trusted federated disentangling network, termed TrFedDis, which utilizes feature disentangling to enable the ability to capture the global domain-invariant cross-client representation and preserve local client-specific feature learning. Meanwhile, to effectively integrate the decoupled features, an uncertainty-aware decision fusion is also introduced to guide the network for dynamically integrating the decoupled features at the evidence level, while producing a reliable prediction with an estimated uncertainty. To the best of our knowledge, our proposed TrFedDis is the first work to develop an FL approach based on evidential uncertainty combined with feature disentangling, which enhances the performance and reliability of FL in non-IID domain features. Extensive experimental results show that our proposed TrFedDis provides outstanding performance with a high degree of reliability as compared to other state-of-the-art FL approaches.
We present the first active learning tool for fine-grained 3D part labeling, a problem which challenges even the most advanced deep learning (DL) methods due to the significant structural variations among the small and intricate parts. For the same reason, the necessary data annotation effort is tremendous, motivating approaches to minimize human involvement. Our labeling tool iteratively verifies or modifies part labels predicted by a deep neural network, with human feedback continually improving the network prediction. To effectively reduce human efforts, we develop two novel features in our tool, hierarchical and symmetry-aware active labeling. Our human-in-the-loop approach, coined HAL3D, achieves 100% accuracy (barring human errors) on any test set with pre-defined hierarchical part labels, with 80% time-saving over manual effort.
Linear perspectivecues deriving from regularities of the built environment can be used to recalibrate both intrinsic and extrinsic camera parameters online, but these estimates can be unreliable due to irregularities in the scene, uncertainties in line segment estimation and background clutter. Here we address this challenge through four initiatives. First, we use the PanoContext panoramic image dataset [27] to curate a novel and realistic dataset of planar projections over a broad range of scenes, focal lengths and camera poses. Second, we use this novel dataset and the YorkUrbanDB [4] to systematically evaluate the linear perspective deviation measures frequently found in the literature and show that the choice of deviation measure and likelihood model has a huge impact on reliability. Third, we use these findings to create a novel system for online camera calibration we call fR, and show that it outperforms the prior state of the art, substantially reducing error in estimated camera rotation and focal length. Our fourth contribution is a novel and efficient approach to estimating uncertainty that can dramatically improve online reliability for performance-critical applications by strategically selecting which frames to use for recalibration.