Abstract:Authenticity and condition assessment are central to conservation decision-making, yet interpretation and reporting of thermographic output remain largely bespoke and expert-dependent, complicating comparison across collections and limiting systematic integration into conservation documentation. Pulsed Active Infrared Thermography (AIRT) is sensitive to subsurface features such as material heterogeneity, voids, and past interventions; however, its broader adoption is constrained by artifact misinterpretation, inter-laboratory variability, and the absence of standardized, explainable reporting frameworks. Although multi-modal thermographic processing techniques are established, their integration with structured natural-language interpretation has not been explored in cultural heritage. A fully automated thermography-vision-language model (VLM) framework is presented. It combines multi-modal AIRT analysis with modality-aware textual reporting, without human intervention during inference. Thermal sequences are processed using Principal Component Thermography (PCT), Thermographic Signal Reconstruction (TSR), and Pulsed Phase Thermography (PPT), and the resulting anomaly masks are fused into a consensus segmentation that emphasizes regions supported by multiple thermal indicators while mitigating boundary artifacts. The fused evidence is provided to a VLM, which generates structured reports describing the location of the anomaly, thermal behavior, and plausible physical interpretations while explicitly acknowledging the uncertainty and diagnostic limitations. Evaluation on two marquetries demonstrates consistent anomaly detection and stable structured interpretations, indicating reproducibility and generalizability across samples.
Abstract:Active infrared thermography (AIRT) is currently witnessing a surge of artificial intelligence (AI) methodologies being deployed for automated subsurface defect analysis of high performance carbon fiber-reinforced polymers (CFRP). Deploying AI-based AIRT methodologies for inspecting CFRPs requires the creation of time consuming and expensive datasets of CFRP inspection sequences to train neural networks. To address this challenge, this work introduces a novel language-guided framework for cognitive defect analysis in CFRPs using AIRT and vision-language models (VLMs). Unlike conventional learning-based approaches, the proposed framework does not require developing training datasets for extensive training of defect detectors, instead it relies solely on pretrained multimodal VLM encoders coupled with a lightweight adapter to enable generative zero-shot understanding and localization of subsurface defects. By leveraging pretrained multimodal encoders, the proposed system enables generative zero-shot understanding of thermographic patterns and automatic detection of subsurface defects. Given the domain gap between thermographic data and natural images used to train VLMs, an AIRT-VLM Adapter is proposed to enhance the visibility of defects while aligning the thermographic domain with the learned representations of VLMs. The proposed framework is validated using three representative VLMs; specifically, GroundingDINO, Qwen-VL-Chat, and CogVLM. Validation is performed on 25 CFRP inspection sequences with impacts introduced at different energy levels, reflecting realistic defects encountered in industrial scenarios. Experimental results demonstrate that the AIRT-VLM adapter achieves signal-to-noise ratio (SNR) gains exceeding 10 dB compared with conventional thermographic dimensionality-reduction methods, while enabling zero-shot defect detection with intersection-over-union values reaching 70%.
Abstract:Active infrared thermography (AIRT) became a crucial tool in aerospace non-destructive testing (NDT), enabling the detection of hidden defects and anomalies in materials by capturing thermal responses over time. In AIRT, autoencoders are widely used to enhance defect detection by reducing the dimensionality of thermal data and improving the signal-to-noise ratio. However, traditional AIRT autoencoders often struggle to disentangle subtle defect features from dominant background responses, leading to suboptimal defect analysis under varying material and inspection conditions. To overcome this challenge, this work proposes a Masked CNN-Attention Autoencoder (AIRT-Masked-CAAE) that integrates convolutional feature extraction with attention mechanisms to capture both local thermal patterns and global contextual dependencies. The AIRT-Masked-CAAE introduces a masked sequence autoencoding strategy, where the network learns to infer missing thermal responses from surrounding contextual cues, while suppressing background redundancy. In addition, the proposed masked sequence autoencoding approach enables training on only a subset of the thermal sequence, while providing generalizable latent representations and reducing training time by a factor of 30. The AIRT-Masked-CAAE framework was evaluated using specimens made of PVC, CFRP, and PLA. The results demonstrate that the AIRT-Masked-CAAE surpasses state-of-the-art AIRT autoencoders in terms of contrast, signal-to-noise ratio (SNR), and metrics based on neural networks.
Abstract:Active Infrared thermography (AIRT) is a widely adopted non-destructive testing (NDT) technique for detecting subsurface anomalies in industrial components. Due to the high dimensionality of AIRT data, current approaches employ non-linear autoencoders (AEs) for dimensionality reduction. However, the latent space learned by AIRT AEs lacks structure, limiting their effectiveness in downstream defect characterization tasks. To address this limitation, this paper proposes a principal component analysis guided (PCA-guided) autoencoding framework for structured dimensionality reduction to capture intricate, non-linear features in thermographic signals while enforcing a structured latent space. A novel loss function, PCA distillation loss, is introduced to guide AIRT AEs to align the latent representation with structured PCA components while capturing the intricate, non-linear patterns in thermographic signals. To evaluate the utility of the learned, structured latent space, we propose a neural network-based evaluation metric that assesses its suitability for defect characterization. Experimental results show that the proposed PCA-guided AE outperforms state-of-the-art dimensionality reduction methods on PVC, CFRP, and PLA samples in terms of contrast, signal-to-noise ratio (SNR), and neural network-based metrics.




Abstract:AI-driven pulse thermography (PT) has become a crucial tool in non-destructive testing (NDT), enabling automatic detection of hidden anomalies in various industrial components. Current state-of-the-art techniques feed segmentation and depth estimation networks compressed PT sequences using either Principal Component Analysis (PCA) or Thermographic Signal Reconstruction (TSR). However, treating these two modalities independently constrains the performance of PT inspection models as these representations possess complementary semantic features. To address this limitation, this work proposes PT-Fusion, a multi-modal attention-based fusion network that fuses both PCA and TSR modalities for defect segmentation and depth estimation of subsurface defects in PT setups. PT-Fusion introduces novel feature fusion modules, Encoder Attention Fusion Gate (EAFG) and Attention Enhanced Decoding Block (AEDB), to fuse PCA and TSR features for enhanced segmentation and depth estimation of subsurface defects. In addition, a novel data augmentation technique is proposed based on random data sampling from thermographic sequences to alleviate the scarcity of PT datasets. The proposed method is benchmarked against state-of-the-art PT inspection models, including U-Net, attention U-Net, and 3D-CNN on the Universit\'e Laval IRT-PVC dataset. The results demonstrate that PT-Fusion outperforms the aforementioned models in defect segmentation and depth estimation accuracies with a margin of 10%.
Abstract:Event cameras triggered a paradigm shift in the computer vision community delineated by their asynchronous nature, low latency, and high dynamic range. Calibration of event cameras is always essential to account for the sensor intrinsic parameters and for 3D perception. However, conventional image-based calibration techniques are not applicable due to the asynchronous, binary output of the sensor. The current standard for calibrating event cameras relies on either blinking patterns or event-based image reconstruction algorithms. These approaches are difficult to deploy in factory settings and are affected by noise and artifacts degrading the calibration performance. To bridge these limitations, we present E-Calib, a novel, fast, robust, and accurate calibration toolbox for event cameras utilizing the asymmetric circle grid, for its robustness to out-of-focus scenes. The proposed method is tested in a variety of rigorous experiments for different event camera models, on circle grids with different geometric properties, and under challenging illumination conditions. The results show that our approach outperforms the state-of-the-art in detection success rate, reprojection error, and estimation accuracy of extrinsic parameters.
Abstract:Countersink inspection is crucial in various automated assembly lines, especially in the aerospace and automotive sectors. Advancements in machine vision introduced automated robotic inspection of countersinks using laser scanners and monocular cameras. Nevertheless, the aforementioned sensing pipelines require the robot to pause on each hole for inspection due to high latency and measurement uncertainties with motion, leading to prolonged execution times of the inspection task. The neuromorphic vision sensor, on the other hand, has the potential to expedite the countersink inspection process, but the unorthodox output of the neuromorphic technology prohibits utilizing traditional image processing techniques. Therefore, novel event-based perception algorithms need to be introduced. We propose a countersink detection approach on the basis of event-based motion compensation and the mean-shift clustering principle. In addition, our framework presents a robust event-based circle detection algorithm to precisely estimate the depth of the countersink specimens. The proposed approach expedites the inspection process by a factor of 10$\times$ compared to conventional countersink inspection methods. The work in this paper was validated for over 50 trials on three countersink workpiece variants. The experimental results show that our method provides a precision of 0.025 mm for countersink depth inspection despite the low resolution of commercially available neuromorphic cameras.




Abstract:Space exploration has witnessed revolutionary changes upon landing of the Perseverance Rover on the Martian surface and demonstrating the first flight beyond Earth by the Mars helicopter, Ingenuity. During their mission on Mars, Perseverance Rover and Ingenuity collaboratively explore the Martian surface, where Ingenuity scouts terrain information for rover's safe traversability. Hence, determining the relative poses between both the platforms is of paramount importance for the success of this mission. Driven by this necessity, this work proposes a robust relative localization system based on a fusion of neuromorphic vision-based measurements (NVBMs) and inertial measurements. The emergence of neuromorphic vision triggered a paradigm shift in the computer vision community, due to its unique working principle delineated with asynchronous events triggered by variations of light intensities occurring in the scene. This implies that observations cannot be acquired in static scenes due to illumination invariance. To circumvent this limitation, high frequency active landmarks are inserted in the scene to guarantee consistent event firing. These landmarks are adopted as salient features to facilitate relative localization. A novel event-based landmark identification algorithm using Gaussian Mixture Models (GMM) is developed for matching the landmarks correspondences formulating our NVBMs. The NVBMs are fused with inertial measurements in proposed state estimators, landmark tracking Kalman filter (LTKF) and translation decoupled Kalman filter (TDKF) for landmark tracking and relative localization, respectively. The proposed system was tested in a variety of experiments and has outperformed state-of-the-art approaches in accuracy and range.