for the JEM-EUSO collaboration
Abstract:Aims. To develop a deep-learning based system for recognition of subclinical atherosclerosis on a plain frontal chest x-ray. Methods and Results. A deep-learning algorithm to predict coronary artery calcium (CAC) score (the AI-CAC model) was developed on 460 chest x-ray (80% training cohort, 20% internal validation cohort) of primary prevention patients (58.4% male, median age 63 [51-74] years) with available paired chest x-ray and chest computed tomography (CT) indicated for any clinical reason and performed within 3 months. The CAC score calculated on chest CT was used as ground truth. The model was validated on an temporally-independent cohort of 90 patients from the same institution (external validation). The diagnostic accuracy of the AI-CAC model assessed by the area under the curve (AUC) was the primary outcome. Overall, median AI-CAC score was 35 (0-388) and 28.9% patients had no AI-CAC. AUC of the AI-CAC model to identify a CAC>0 was 0.90 in the internal validation cohort and 0.77 in the external validation cohort. Sensitivity was consistently above 92% in both cohorts. In the overall cohort (n=540), among patients with AI-CAC=0, a single ASCVD event occurred, after 4.3 years. Patients with AI-CAC>0 had significantly higher Kaplan Meier estimates for ASCVD events (13.5% vs. 3.4%, log-rank=0.013). Conclusion. The AI-CAC model seems to accurately detect subclinical atherosclerosis on chest x-ray with elevated sensitivity, and to predict ASCVD events with elevated negative predictive value. Adoption of the AI-CAC model to refine CV risk stratification or as an opportunistic screening tool requires prospective evaluation.
Abstract:Neural network quantization is an essential technique for deploying models on resource-constrained devices. However, its impact on model perceptual fields, particularly regarding class activation maps (CAMs), remains a significant area of investigation. In this study, we explore how quantization alters the spatial recognition ability of the perceptual field of vision models, shedding light on the alignment between CAMs and visual saliency maps across various architectures. Leveraging a dataset of 10,000 images from ImageNet, we rigorously evaluate six diverse foundational CNNs: VGG16, ResNet50, EfficientNet, MobileNet, SqueezeNet, and DenseNet. We uncover nuanced changes in CAMs and their alignment with human visual saliency maps through systematic quantization techniques applied to these models. Our findings reveal the varying sensitivities of different architectures to quantization and underscore its implications for real-world applications in terms of model performance and interpretability. The primary contribution of this work revolves around deepening our understanding of neural network quantization, providing insights crucial for deploying efficient and interpretable models in practical settings.
Abstract:Conventional imaging diagnostics frequently encounter bottlenecks due to manual inspection, which can lead to delays and inconsistencies. Although deep learning offers a pathway to automation and enhanced accuracy, foundational models in computer vision often emphasize global context at the expense of local details, which are vital for medical imaging diagnostics. To address this, we harness the Swin Transformer's capacity to discern extended spatial dependencies within images through the hierarchical framework. Our novel contribution lies in refining local feature representations, orienting them specifically toward the final distribution of the classifier. This method ensures that local features are not only preserved but are also enriched with task-specific information, enhancing their relevance and detail at every hierarchical level. By implementing this strategy, our model demonstrates significant robustness and precision, as evidenced by extensive validation of two established benchmarks for Knee OsteoArthritis (KOA) grade classification. These results highlight our approach's effectiveness and its promising implications for the future of medical imaging diagnostics. Our implementation is available on https://github.com/mtliba/KOA_NLCS2024
Abstract:The widespread adoption of large language models (LLMs) across diverse AI applications is proof of the outstanding achievements obtained in several tasks, such as text mining, text generation, and question answering. However, LLMs are not exempt from drawbacks. One of the most concerning aspects regards the emerging problematic phenomena known as "Hallucinations". They manifest in text generation systems, particularly in question-answering systems reliant on LLMs, potentially resulting in false or misleading information propagation. This paper delves into the underlying causes of AI hallucination and elucidates its significance in artificial intelligence. In particular, Hallucination classification is tackled over several tasks (Machine Translation, Question and Answer, Dialog Systems, Summarisation Systems, Knowledge Graph with LLMs, and Visual Question Answer). Additionally, we explore potential strategies to mitigate hallucinations, aiming to enhance the overall reliability of LLMs. Our research addresses this critical issue within the HeReFaNMi (Health-Related Fake News Mitigation) project, generously supported by NGI Search, dedicated to combating Health-Related Fake News dissemination on the Internet. This endeavour represents a concerted effort to safeguard the integrity of information dissemination in an age of evolving AI technologies.
Abstract:Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint. Early detection and diagnosis are crucial for successful clinical intervention and management to prevent severe complications, such as loss of mobility. In this paper, we propose an automated approach that employs the Swin Transformer to predict the severity of KOA. Our model uses publicly available radiographic datasets with Kellgren and Lawrence scores to enable early detection and severity assessment. To improve the accuracy of our model, we employ a multi-prediction head architecture that utilizes multi-layer perceptron classifiers. Additionally, we introduce a novel training approach that reduces the data drift between multiple datasets to ensure the generalization ability of the model. The results of our experiments demonstrate the effectiveness and feasibility of our approach in predicting KOA severity accurately.
Abstract:The visual scanpath is a sequence of points through which the human gaze moves while exploring a scene. It represents the fundamental concepts upon which visual attention research is based. As a result, the ability to predict them has emerged as an important task in recent years. In this paper, we propose an inter-observer consistent adversarial training approach for scanpath prediction through a lightweight deep neural network. The adversarial method employs a discriminative neural network as a dynamic loss that is better suited to model the natural stochastic phenomenon while maintaining consistency between the distributions related to the subjective nature of scanpaths traversed by different observers. Through extensive testing, we show the competitiveness of our approach in regard to state-of-the-art methods.
Abstract:Cultural heritage understanding and preservation is an important issue for society as it represents a fundamental aspect of its identity. Paintings represent a significant part of cultural heritage, and are the subject of study continuously. However, the way viewers perceive paintings is strictly related to the so-called HVS (Human Vision System) behaviour. This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings. In further details, we introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans, including the fundamental understanding of a scene, and then extend it to painting images. The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention. We use an FCNN (Fully Convolutional Neural Network), in which we exploit a differentiable channel-wise selection and Soft-Argmax modules. We also incorporate learnable Gaussian distributions onto the network bottleneck to simulate visual attention process bias in natural scene images. Furthermore, to reduce the effect of shifts between different domains (i.e. natural images, painting), we urge the model to learn unsupervised general features from other domains using a gradient reversal classifier. The results obtained by our model outperform existing state-of-the-art ones in terms of accuracy and efficiency.
Abstract:A growing number of commercially available mobile phones come with integrated high-resolution digital cameras. That enables a new class of dedicated applications to image analysis such as mobile visual search, image cropping, object detection, content-based image retrieval, image classification. In this paper, a new mobile application for image content retrieval and classification for mobile device display is proposed to enrich the visual experience of users. The mobile application can extract a certain number of images based on the content of an image with visual saliency methods aiming at detecting the most critical regions in a given image from a perceptual viewpoint. First, the most critical areas from a perceptual perspective are extracted using the local maxima of a 2D saliency function. Next, a salient region is cropped using the bounding box centred on the local maxima of the thresholded Saliency Map of the image. Then, each image crop feds into an Image Classification system based on SVM and SIFT descriptors to detect the class of object present in the image. ImageNet repository was used as the reference for semantic category classification. Android platform was used to implement the mobile application on a client-server architecture. A mobile client sends the photo taken by the camera to the server, which processes the image and returns the results (image contents such as image crops and related target classes) to the mobile client. The application was run on thousands of pictures and showed encouraging results towards a better user visual experience with mobile displays.
Abstract:EUSO-SPB1 was released on April 24th, 2017, from the NASA balloon launch site in Wanaka (New Zealand) and landed on the South Pacific Ocean on May 7th. The data collected by the instruments onboard the balloon were analyzed to search UV pulse signatures of UHECR (Ultra High Energy Cosmic Rays) air showers. Indirect measurements of UHECRs can be affected by cloud presence during nighttime, therefore it is crucial to know the meteorological conditions during the observation period of the detector. During the flight, the onboard EUSO-SPB1 UCIRC camera (University of Chicago Infra-Red Camera), acquired images in the field of view of the UV telescope. The available nighttime and daytime images include information on meteorological conditions of the atmosphere observed in two infra-red bands. The presence of clouds has been investigated employing a method developed to provide a dense cloudiness map for each available infra-red image. The final masks are intended to give pixel cloudiness information at the IR-camera pixel resolution that is nearly 4-times higher than the one of the UV-camera. In this work, cloudiness maps are obtained by using an expert system based on the analysis of different low-level image features. Furthermore, an image enhancement step was needed to be applied as a preprocessing step to deal with uncalibrated data.