Abstract:Conformal prediction (CP) provides a framework for constructing prediction sets with guaranteed coverage, assuming exchangeable data. However, real-world scenarios often involve distribution shifts that violate exchangeability, leading to unreliable coverage and inflated prediction sets. To address this challenge, we first introduce Reconstruction Loss-Scaled Conformal Prediction (RLSCP), which utilizes reconstruction losses derived from a Variational Autoencoder (VAE) as an uncertainty metric to scale score functions. While RLSCP demonstrates performance improvements, mainly resulting in better coverage, it quantifies quantiles based on a fixed calibration dataset without considering the discrepancies between test and train datasets in an unexchangeable setting. In the next step, we propose Weighted Quantile Loss-scaled Conformal Prediction (WQLCP), which refines RLSCP by incorporating a weighted notion of exchangeability, adjusting the calibration quantile threshold based on weights with respect to the ratio of calibration and test loss values. This approach improves the CP-generated prediction set outputs in the presence of distribution shifts. Experiments on large-scale datasets, including ImageNet variants, demonstrate that WQLCP outperforms existing baselines by consistently maintaining coverage while reducing prediction set sizes, providing a robust solution for CP under distribution shifts.
Abstract:High-fidelity 3D reconstruction is critical for aerial inspection tasks such as infrastructure monitoring, structural assessment, and environmental surveying. While traditional photogrammetry techniques enable geometric modeling, they lack semantic interpretability, limiting their effectiveness for automated inspection workflows. Recent advances in neural rendering and 3D Gaussian Splatting (3DGS) offer efficient, photorealistic reconstructions but similarly lack scene-level understanding. In this work, we present a UAV-based pipeline that extends Feature-3DGS for language-guided 3D segmentation. We leverage LSeg-based feature fields with CLIP embeddings to generate heatmaps in response to language prompts. These are thresholded to produce rough segmentations, and the highest-scoring point is then used as a prompt to SAM or SAM2 for refined 2D segmentation on novel view renderings. Our results highlight the strengths and limitations of various feature field backbones (CLIP-LSeg, SAM, SAM2) in capturing meaningful structure in large-scale outdoor environments. We demonstrate that this hybrid approach enables flexible, language-driven interaction with photorealistic 3D reconstructions, opening new possibilities for semantic aerial inspection and scene understanding.
Abstract:Exploring the trustworthiness of deep learning models is crucial, especially in critical domains such as medical imaging decision support systems. Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. However, conformal prediction results face challenges due to the backbone model's struggles in domain-shifted scenarios, such as variations in different sources. To aim this challenge, this paper proposes a novel framework termed Conformal Ensemble of Vision Transformers (CE-ViTs) designed to enhance image classification performance by prioritizing domain adaptation and model robustness, while accounting for uncertainty. The proposed method leverages an ensemble of vision transformer models in the backbone, trained on diverse datasets including HAM10000, Dermofit, and Skin Cancer ISIC datasets. This ensemble learning approach, calibrated through the combined mentioned datasets, aims to enhance domain adaptation through conformal learning. Experimental results underscore that the framework achieves a high coverage rate of 90.38\%, representing an improvement of 9.95\% compared to the HAM10000 model. This indicates a strong likelihood that the prediction set includes the true label compared to singular models. Ensemble learning in CE-ViTs significantly improves conformal prediction performance, increasing the average prediction set size for challenging misclassified samples from 1.86 to 3.075.
Abstract:Ensuring the reliable operation of power transformers is critical to grid stability. Dissolved Gas Analysis (DGA) is widely used for fault diagnosis, but traditional methods rely on heuristic rules, which may lead to inconsistent results. Machine learning (ML)-based approaches have improved diagnostic accuracy; however, power transformers operate under varying conditions, and differences in transformer type, environmental factors, and operational settings create distribution shifts in diagnostic data. Consequently, direct model transfer between transformers often fails, making techniques for domain adaptation a necessity. To tackle this issue, this work proposes a feature-weighted domain adaptation technique that combines Maximum Mean Discrepancy (MMD) and Correlation Alignment (CORAL) with feature-specific weighting (MCW). Kolmogorov-Smirnov (K-S) statistics are used to assign adaptable weights, prioritizing features with larger distributional discrepancies and thereby improving source and target domain alignment. Experimental evaluations on datasets for power transformers demonstrate the effectiveness of the proposed method, which achieves a 7.9% improvement over Fine-Tuning and a 2.2% improvement over MMD-CORAL (MC). Furthermore, it outperforms both techniques across various training sample sizes, confirming its robustness for domain adaptation.
Abstract:Precise estimation of the Remaining Useful Life (RUL) of rolling bearings is an important consideration to avoid unexpected failures, reduce downtime, and promote safety and efficiency in industrial systems. Complications in degradation trends, noise presence, and the necessity to detect faults in advance make estimation of RUL a challenging task. This paper introduces a novel framework that combines wavelet-based denoising method, Wavelet Packet Decomposition (WPD), and a customized multi-channel Swin Transformer model (MCSFormer) to address these problems. With attention mechanisms incorporated for feature fusion, the model is designed to learn global and local degradation patterns utilizing hierarchical representations for enhancing predictive performance. Additionally, a customized loss function is developed as a key distinction of this work to differentiate between early and late predictions, prioritizing accurate early detection and minimizing the high operation risks of late predictions. The proposed model was evaluated with the PRONOSTIA dataset using three experiments. Intra-condition experiments demonstrated that MCSFormer outperformed state-of-the-art models, including the Adaptive Transformer, MDAN, and CNN-SRU, achieving 41%, 64%, and 69% lower MAE on average across different operating conditions, respectively. In terms of cross-condition testing, it achieved superior generalization under varying operating conditions compared to the adapted ViT and Swin Transformer. Lastly, the custom loss function effectively reduced late predictions, as evidenced in a 6.3% improvement in the scoring metric while maintaining competitive overall performance. The model's robust noise resistance, generalization capability, and focus on safety make MCSFormer a trustworthy and effective predictive maintenance tool in industrial applications.
Abstract:We present a novel 5-step framework called Fine-Tuned Offline Reinforcement Learning Augmented Process Sequence Optimization (FORLAPS), which aims to identify optimal execution paths in business processes using reinforcement learning. We implemented this approach on real-life event logs from our case study an energy regulator in Canada and other real-life event logs, demonstrating the feasibility of the proposed method. Additionally, to compare FORLAPS with the existing models (Permutation Feature Importance and multi-task LSTM-Based model), we experimented to evaluate its effectiveness in terms of resource savings and process time span reduction. The experimental results on real-life event log validate that FORLAPS achieves 31% savings in resource time spent and a 23% reduction in process time span. Using this innovative data augmentation technique, we propose a fine-tuned reinforcement learning approach that aims to automatically fine-tune the model by selectively increasing the average estimated Q-value in the sampled batches. The results show that we obtained a 44% performance improvement compared to the pre-trained model. This study introduces an innovative evaluation model, benchmarking its performance against earlier works using nine publicly available datasets. Robustness is ensured through experiments utilizing the Damerau-Levenshtein distance as the primary metric. In addition, we discussed the suitability of datasets, taking into account their inherent properties, to evaluate the performance of different models. The proposed model, FORLAPS, demonstrated exceptional performance, outperforming existing state-of-the-art approaches in suggesting the most optimal policies or predicting the best next activities within a process trace.
Abstract:Class imbalance and label noise are pervasive in large-scale datasets, yet much of machine learning research assumes well-labeled, balanced data, which rarely reflects real world conditions. Existing approaches typically address either label noise or class imbalance in isolation, leading to suboptimal results when both issues coexist. In this work, we propose Conformal-in-the-Loop (CitL), a novel training framework that addresses both challenges with a conformal prediction-based approach. CitL evaluates sample uncertainty to adjust weights and prune unreliable examples, enhancing model resilience and accuracy with minimal computational cost. Our extensive experiments include a detailed analysis showing how CitL effectively emphasizes impactful data in noisy, imbalanced datasets. Our results show that CitL consistently boosts model performance, achieving up to a 6.1% increase in classification accuracy and a 5.0 mIoU improvement in segmentation. Our code is publicly available: CitL.
Abstract:Simplicity bias poses a significant challenge in neural networks, often leading models to favor simpler solutions and inadvertently learn decision rules influenced by spurious correlations. This results in biased models with diminished generalizability. While many current approaches depend on human supervision, obtaining annotations for various bias attributes is often impractical. To address this, we introduce Debiasify, a novel self-distillation approach that requires no prior knowledge about the nature of biases. Our method leverages a new distillation loss to transfer knowledge within the network, from deeper layers containing complex, highly-predictive features to shallower layers with simpler, attribute-conditioned features in an unsupervised manner. This enables Debiasify to learn robust, debiased representations that generalize effectively across diverse biases and datasets, improving both worst-group performance and overall accuracy. Extensive experiments on computer vision and medical imaging benchmarks demonstrate the effectiveness of our approach, significantly outperforming previous unsupervised debiasing methods (e.g., a 10.13% improvement in worst-group accuracy for Wavy Hair classification in CelebA) and achieving comparable or superior performance to supervised approaches. Our code is publicly available at the following link: Debiasify.
Abstract:Safe Reinforcement Learning (Safe RL) is one of the prevalently studied subcategories of trial-and-error-based methods with the intention to be deployed on real-world systems. In safe RL, the goal is to maximize reward performance while minimizing constraints, often achieved by setting bounds on constraint functions and utilizing the Lagrangian method. However, deploying Lagrangian-based safe RL in real-world scenarios is challenging due to the necessity of threshold fine-tuning, as imprecise adjustments may lead to suboptimal policy convergence. To mitigate this challenge, we propose a unified Lagrangian-based model-free architecture called Meta Soft Actor-Critic Lagrangian (Meta SAC-Lag). Meta SAC-Lag uses meta-gradient optimization to automatically update the safety-related hyperparameters. The proposed method is designed to address safe exploration and threshold adjustment with minimal hyperparameter tuning requirement. In our pipeline, the inner parameters are updated through the conventional formulation and the hyperparameters are adjusted using the meta-objectives which are defined based on the updated parameters. Our results show that the agent can reliably adjust the safety performance due to the relatively fast convergence rate of the safety threshold. We evaluate the performance of Meta SAC-Lag in five simulated environments against Lagrangian baselines, and the results demonstrate its capability to create synergy between parameters, yielding better or competitive results. Furthermore, we conduct a real-world experiment involving a robotic arm tasked with pouring coffee into a cup without spillage. Meta SAC-Lag is successfully trained to execute the task, while minimizing effort constraints.
Abstract:Deformation detection is vital for enabling accurate assessment and prediction of structural changes in materials, ensuring timely and effective interventions to maintain safety and integrity. Automating deformation detection through computer vision is crucial for efficient monitoring, but it faces significant challenges in creating a comprehensive dataset of both deformed and non-deformed objects, which can be difficult to obtain in many scenarios. In this paper, we introduce a novel framework for generating controlled synthetic data that simulates deformed objects. This approach allows for the realistic modeling of object deformations under various conditions. Our framework integrates an intelligent adapter network that facilitates sim-to-real domain adaptation, enhancing classification results without requiring real data from deformed objects. We conduct experiments on domain adaptation and classification tasks and demonstrate that our framework improves sim-to-real classification results compared to simulation baseline.