Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marco Grangetto

Robust brain age estimation from structural MRI with contrastive learning

Jul 02, 2025

Carlo Alberto Barbano, Benoit Dufumier, Edouard Duchesnay, Marco Grangetto, Pietro Gori

Abstract:Estimating brain age from structural MRI has emerged as a powerful tool for characterizing normative and pathological aging. In this work, we explore contrastive learning as a scalable and robust alternative to supervised approaches for brain age estimation. We introduce a novel contrastive loss function, $\mathcal{L}^{exp}$, and evaluate it across multiple public neuroimaging datasets comprising over 20,000 scans. Our experiments reveal four key findings. First, scaling pre-training on diverse, multi-site data consistently improves generalization performance, cutting external mean absolute error (MAE) nearly in half. Second, $\mathcal{L}^{exp}$ is robust to site-related confounds, maintaining low scanner-predictability as training size increases. Third, contrastive models reliably capture accelerated aging in patients with cognitive impairment and Alzheimer's disease, as shown through brain age gap analysis, ROC curves, and longitudinal trends. Lastly, unlike supervised baselines, $\mathcal{L}^{exp}$ maintains a strong correlation between brain age accuracy and downstream diagnostic performance, supporting its potential as a foundation model for neuroimaging. These results position contrastive learning as a promising direction for building generalizable and clinically meaningful brain representations.

* 11 pages

Via

Access Paper or Ask Questions

When Does Pruning Benefit Vision Representations?

Jul 02, 2025

Enrico Cassano, Riccardo Renzulli, Andrea Bragagnolo, Marco Grangetto

Abstract:Pruning is widely used to reduce the complexity of deep learning models, but its effects on interpretability and representation learning remain poorly understood. This paper investigates how pruning influences vision models across three key dimensions: (i) interpretability, (ii) unsupervised object discovery, and (iii) alignment with human perception. We first analyze different vision network architectures to examine how varying sparsity levels affect feature attribution interpretability methods. Additionally, we explore whether pruning promotes more succinct and structured representations, potentially improving unsupervised object discovery by discarding redundant information while preserving essential features. Finally, we assess whether pruning enhances the alignment between model representations and human perception, investigating whether sparser models focus on more discriminative features similarly to humans. Our findings also reveal the presence of sweet spots, where sparse models exhibit higher interpretability, downstream generalization and human alignment. However, these spots highly depend on the network architectures and their size in terms of trainable parameters. Our results suggest a complex interplay between these three dimensions, highlighting the importance of investigating when and how pruning benefits vision representations.

Via

Access Paper or Ask Questions

AA-SGAN: Adversarially Augmented Social GAN with Synthetic Data

Dec 23, 2024

Mirko Zaffaroni, Federico Signoretta, Marco Grangetto, Attilio Fiandrotti

Figure 1 for AA-SGAN: Adversarially Augmented Social GAN with Synthetic Data

Figure 2 for AA-SGAN: Adversarially Augmented Social GAN with Synthetic Data

Figure 3 for AA-SGAN: Adversarially Augmented Social GAN with Synthetic Data

Figure 4 for AA-SGAN: Adversarially Augmented Social GAN with Synthetic Data

Abstract:Accurately predicting pedestrian trajectories is crucial in applications such as autonomous driving or service robotics, to name a few. Deep generative models achieve top performance in this task, assuming enough labelled trajectories are available for training. To this end, large amounts of synthetically generated, labelled trajectories exist (e.g., generated by video games). However, such trajectories are not meant to represent pedestrian motion realistically and are ineffective at training a predictive model. We propose a method and an architecture to augment synthetic trajectories at training time and with an adversarial approach. We show that trajectory augmentation at training time unleashes significant gains when a state-of-the-art generative model is evaluated over real-world trajectories.

Via

Access Paper or Ask Questions

Knowledge Transfer Across Modalities with Natural Language Supervision

Nov 23, 2024

Carlo Alberto Barbano, Luca Molinaro, Emanuele Aiello, Marco Grangetto

Figure 1 for Knowledge Transfer Across Modalities with Natural Language Supervision

Figure 2 for Knowledge Transfer Across Modalities with Natural Language Supervision

Figure 3 for Knowledge Transfer Across Modalities with Natural Language Supervision

Figure 4 for Knowledge Transfer Across Modalities with Natural Language Supervision

Abstract:We present a way to learn novel concepts by only using their textual description. We call this method Knowledge Transfer. Similarly to human perception, we leverage cross-modal interaction to introduce new concepts. We hypothesize that in a pre-trained visual encoder there are enough low-level features already learned (e.g. shape, appearance, color) that can be used to describe previously unknown high-level concepts. Provided with a textual description of the novel concept, our method works by aligning the known low-level features of the visual encoder to its high-level textual description. We show that Knowledge Transfer can successfully introduce novel concepts in multimodal models, in a very efficient manner, by only requiring a single description of the target concept. Our approach is compatible with both separate textual and visual encoders (e.g. CLIP) and shared parameters across modalities. We also show that, following the same principle, Knowledge Transfer can improve concepts already known by the model. Leveraging Knowledge Transfer we improve zero-shot performance across different tasks such as classification, segmentation, image-text retrieval, and captioning.

* 21 pages, 7 figures, 17 tables

Via

Access Paper or Ask Questions

Efficient Progressive Image Compression with Variance-aware Masking

Nov 15, 2024

Alberto Presta, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto, Pamela Cosman

Abstract:Learned progressive image compression is gaining momentum as it allows improved image reconstruction as more bits are decoded at the receiver. We propose a progressive image compression method in which an image is first represented as a pair of base-quality and top-quality latent representations. Next, a residual latent representation is encoded as the element-wise difference between the top and base representations. Our scheme enables progressive image compression with element-wise granularity by introducing a masking system that ranks each element of the residual latent representation from most to least important, dividing it into complementary components, which can be transmitted separately to the decoder in order to obtain different reconstruction quality. The masking system does not add further parameters nor complexity. At the receiver, any elements of the top latent representation excluded from the transmitted components can be independently replaced with the mean predicted by the hyperprior architecture, ensuring reliable reconstructions at any intermediate quality level. We also introduced Rate Enhancement Modules (REMs), which refine the estimation of entropy parameters using already decoded components. We obtain results competitive with state-of-the-art competitors, while significantly reducing computational complexity, decoding time, and number of parameters.

* 10 pages. Accepted at WACV 2025

Via

Access Paper or Ask Questions

GABIC: Graph-based Attention Block for Image Compression

Oct 03, 2024

Gabriele Spadaro, Alberto Presta, Enzo Tartaglione, Jhony H. Giraldo, Marco Grangetto, Attilio Fiandrotti

Abstract:While standardized codecs like JPEG and HEVC-intra represent the industry standard in image compression, neural Learned Image Compression (LIC) codecs represent a promising alternative. In detail, integrating attention mechanisms from Vision Transformers into LIC models has shown improved compression efficiency. However, extra efficiency often comes at the cost of aggregating redundant features. This work proposes a Graph-based Attention Block for Image Compression (GABIC), a method to reduce feature redundancy based on a k-Nearest Neighbors enhanced attention mechanism. Our experiments show that GABIC outperforms comparable methods, particularly at high bit rates, enhancing compression performance.

* 10 pages, 5 figures, accepted at ICIP 2024

Via

Access Paper or Ask Questions

STanH : Parametric Quantization for Variable Rate Learned Image Compression

Oct 01, 2024

Alberto Presta, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto

Figure 1 for STanH : Parametric Quantization for Variable Rate Learned Image Compression

Figure 2 for STanH : Parametric Quantization for Variable Rate Learned Image Compression

Figure 3 for STanH : Parametric Quantization for Variable Rate Learned Image Compression

Figure 4 for STanH : Parametric Quantization for Variable Rate Learned Image Compression

Abstract:In end-to-end learned image compression, encoder and decoder are jointly trained to minimize a $R + {\lambda}D$ cost function, where ${\lambda}$ controls the trade-off between rate of the quantized latent representation and image quality. Unfortunately, a distinct encoder-decoder pair with millions of parameters must be trained for each ${\lambda}$, hence the need to switch encoders and to store multiple encoders and decoders on the user device for every target rate. This paper proposes to exploit a differentiable quantizer designed around a parametric sum of hyperbolic tangents, called STanH , that relaxes the step-wise quantization function. STanH is implemented as a differentiable activation layer with learnable quantization parameters that can be plugged into a pre-trained fixed rate model and refined to achieve different target bitrates. Experimental results show that our method enables variable rate coding with comparable efficiency to the state-of-the-art, yet with significant savings in terms of ease of deployment, training time, and storage costs

* Submitted to IEEE Transactions on Image Processing

Via

Access Paper or Ask Questions

WiGNet: Windowed Vision Graph Neural Network

Oct 01, 2024

Gabriele Spadaro, Marco Grangetto, Attilio Fiandrotti, Enzo Tartaglione, Jhony H. Giraldo

Figure 1 for WiGNet: Windowed Vision Graph Neural Network

Figure 2 for WiGNet: Windowed Vision Graph Neural Network

Figure 3 for WiGNet: Windowed Vision Graph Neural Network

Figure 4 for WiGNet: Windowed Vision Graph Neural Network

Abstract:In recent years, Graph Neural Networks (GNNs) have demonstrated strong adaptability to various real-world challenges, with architectures such as Vision GNN (ViG) achieving state-of-the-art performance in several computer vision tasks. However, their practical applicability is hindered by the computational complexity of constructing the graph, which scales quadratically with the image size. In this paper, we introduce a novel Windowed vision Graph neural Network (WiGNet) model for efficient image processing. WiGNet explores a different strategy from previous works by partitioning the image into windows and constructing a graph within each window. Therefore, our model uses graph convolutions instead of the typical 2D convolution or self-attention mechanism. WiGNet effectively manages computational and memory complexity for large image sizes. We evaluate our method in the ImageNet-1k benchmark dataset and test the adaptability of WiGNet using the CelebA-HQ dataset as a downstream task with higher-resolution images. In both of these scenarios, our method achieves competitive results compared to previous vision GNNs while keeping memory and computational complexity at bay. WiGNet offers a promising solution toward the deployment of vision GNNs in real-world applications. We publicly released the code at https://github.com/EIDOSLAB/WiGNet.

Via

Access Paper or Ask Questions

Anatomical Foundation Models for Brain MRIs

Aug 07, 2024

Carlo Alberto Barbano, Matteo Brunello, Benoit Dufumier, Marco Grangetto

Abstract:Deep Learning (DL) in neuroimaging has become increasingly relevant for detecting neurological conditions and neurodegenerative disorders. One of the most predominant biomarkers in neuroimaging is represented by brain age, which has been shown to be a good indicator for different conditions, such as Alzheimer's Disease. Using brain age for pretraining DL models in transfer learning settings has also recently shown promising results, especially when dealing with data scarcity of different conditions. On the other hand, anatomical information of brain MRIs (e.g. cortical thickness) can provide important information for learning good representations that can be transferred to many downstream tasks. In this work, we propose AnatCL, an anatomical foundation model for brain MRIs that i.) leverages anatomical information with a weakly contrastive learning approach and ii.) achieves state-of-the-art performances in many different downstream tasks. To validate our approach we consider 12 different downstream tasks for diagnosis classification, and prediction of 10 different clinical assessment scores.

* 12 pages

Via

Access Paper or Ask Questions

Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering

Jul 15, 2024

Francesco Di Sario, Riccardo Renzulli, Enzo Tartaglione, Marco Grangetto

Figure 1 for Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering

Figure 2 for Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering

Figure 3 for Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering

Figure 4 for Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering

Abstract:Since the introduction of NeRFs, considerable attention has been focused on improving their training and inference times, leading to the development of Fast-NeRFs models. Despite demonstrating impressive rendering speed and quality, the rapid convergence of such models poses challenges for further improving reconstruction quality. Common strategies to improve rendering quality involves augmenting model parameters or increasing the number of sampled points. However, these computationally intensive approaches encounter limitations in achieving significant quality enhancements. This study introduces a model-agnostic framework inspired by Sparsely-Gated Mixture of Experts to enhance rendering quality without escalating computational complexity. Our approach enables specialization in rendering different scene components by employing a mixture of experts with varying resolutions. We present a novel gate formulation designed to maximize expert capabilities and propose a resolution-based routing technique to effectively induce sparsity and decompose scenes. Our work significantly improves reconstruction quality while maintaining competitive performance.

Via

Access Paper or Ask Questions