Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alberto Presta

bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition

May 11, 2026

Michal Byra, Pawel Olszowiec, Grzegorz Stefanski, Grzegorz Gruszczynski, Alberto Presta

Abstract:Vision Transformers (ViTs) are built by stacking independently parameterized blocks, but it remains unclear how much of this depth requires layer specific transformations and how much can be realized through recurrent computation. We study this question with bViT, a single-block recurrent ViT in which one transformer block is applied repeatedly to process an image. This architecture preserves the iterative structure of a deep ViT while removing layer specific block parameterization, providing a controlled setting for studying recurrence in vision. On ImageNet-1K, a 12-step bViT-B achieves accuracy comparable to standard ViT-B under the same training recipe and computational budget, while using an order of magnitude fewer parameters. We observe that recurrent performance improves with representation width, with wider bViTs recovering much more of the performance of standard ViTs than narrow variants. We interpret this behavior as implicit depth multiplexing, where a shared block expresses multiple step-dependent computations through the evolving hidden state. Beyond ImageNet classification, bViT transfers competitively to downstream tasks and enables parameter-efficient fine-tuning. Mechanistic analyses of activations, attention and step-specific pruning show that the shared block changes its effective behavior across recurrent steps rather than simply repeating the same computation. Our results suggest that a large fraction of ViT depth can be implemented through recurrent reuse, provided that the representation space is sufficiently wide.

* 31 pages, 16 figures

Via

Access Paper or Ask Questions

Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data

Jan 29, 2026

Grzegorz Stefanski, Alberto Presta, Michal Byra

Abstract:In pruning, the Lottery Ticket Hypothesis posits that large networks contain sparse subnetworks, or winning tickets, that can be trained in isolation to match the performance of their dense counterparts. However, most existing approaches assume a single universal winning ticket shared across all inputs, ignoring the inherent heterogeneity of real-world data. In this work, we propose Routing the Lottery (RTL), an adaptive pruning framework that discovers multiple specialized subnetworks, called adaptive tickets, each tailored to a class, semantic cluster, or environmental condition. Across diverse datasets and tasks, RTL consistently outperforms single- and multi-model baselines in balanced accuracy and recall, while using up to 10 times fewer parameters than independent models and exhibiting semantically aligned. Furthermore, we identify subnetwork collapse, a performance drop under aggressive pruning, and introduce a subnetwork similarity score that enables label-free diagnosis of oversparsification. Overall, our results recast pruning as a mechanism for aligning model structure with data heterogeneity, paving the way toward more modular and context-aware deep learning.

Via

Access Paper or Ask Questions

Efficient Progressive Image Compression with Variance-aware Masking

Nov 15, 2024

Alberto Presta, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto, Pamela Cosman

Figure 1 for Efficient Progressive Image Compression with Variance-aware Masking

Figure 2 for Efficient Progressive Image Compression with Variance-aware Masking

Figure 3 for Efficient Progressive Image Compression with Variance-aware Masking

Figure 4 for Efficient Progressive Image Compression with Variance-aware Masking

Abstract:Learned progressive image compression is gaining momentum as it allows improved image reconstruction as more bits are decoded at the receiver. We propose a progressive image compression method in which an image is first represented as a pair of base-quality and top-quality latent representations. Next, a residual latent representation is encoded as the element-wise difference between the top and base representations. Our scheme enables progressive image compression with element-wise granularity by introducing a masking system that ranks each element of the residual latent representation from most to least important, dividing it into complementary components, which can be transmitted separately to the decoder in order to obtain different reconstruction quality. The masking system does not add further parameters nor complexity. At the receiver, any elements of the top latent representation excluded from the transmitted components can be independently replaced with the mean predicted by the hyperprior architecture, ensuring reliable reconstructions at any intermediate quality level. We also introduced Rate Enhancement Modules (REMs), which refine the estimation of entropy parameters using already decoded components. We obtain results competitive with state-of-the-art competitors, while significantly reducing computational complexity, decoding time, and number of parameters.

* 10 pages. Accepted at WACV 2025

Via

Access Paper or Ask Questions

GABIC: Graph-based Attention Block for Image Compression

Oct 03, 2024

Gabriele Spadaro, Alberto Presta, Enzo Tartaglione, Jhony H. Giraldo, Marco Grangetto, Attilio Fiandrotti

Figure 1 for GABIC: Graph-based Attention Block for Image Compression

Figure 2 for GABIC: Graph-based Attention Block for Image Compression

Figure 3 for GABIC: Graph-based Attention Block for Image Compression

Figure 4 for GABIC: Graph-based Attention Block for Image Compression

Abstract:While standardized codecs like JPEG and HEVC-intra represent the industry standard in image compression, neural Learned Image Compression (LIC) codecs represent a promising alternative. In detail, integrating attention mechanisms from Vision Transformers into LIC models has shown improved compression efficiency. However, extra efficiency often comes at the cost of aggregating redundant features. This work proposes a Graph-based Attention Block for Image Compression (GABIC), a method to reduce feature redundancy based on a k-Nearest Neighbors enhanced attention mechanism. Our experiments show that GABIC outperforms comparable methods, particularly at high bit rates, enhancing compression performance.

* 10 pages, 5 figures, accepted at ICIP 2024

Via

Access Paper or Ask Questions

STanH : Parametric Quantization for Variable Rate Learned Image Compression

Oct 01, 2024

Alberto Presta, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto

Figure 1 for STanH : Parametric Quantization for Variable Rate Learned Image Compression

Figure 2 for STanH : Parametric Quantization for Variable Rate Learned Image Compression

Figure 3 for STanH : Parametric Quantization for Variable Rate Learned Image Compression

Figure 4 for STanH : Parametric Quantization for Variable Rate Learned Image Compression

Abstract:In end-to-end learned image compression, encoder and decoder are jointly trained to minimize a $R + {\lambda}D$ cost function, where ${\lambda}$ controls the trade-off between rate of the quantized latent representation and image quality. Unfortunately, a distinct encoder-decoder pair with millions of parameters must be trained for each ${\lambda}$, hence the need to switch encoders and to store multiple encoders and decoders on the user device for every target rate. This paper proposes to exploit a differentiable quantizer designed around a parametric sum of hyperbolic tangents, called STanH , that relaxes the step-wise quantization function. STanH is implemented as a differentiable activation layer with learnable quantization parameters that can be plugged into a pre-trained fixed rate model and refined to achieve different target bitrates. Experimental results show that our method enables variable rate coding with comparable efficiency to the state-of-the-art, yet with significant savings in terms of ease of deployment, training time, and storage costs

* Submitted to IEEE Transactions on Image Processing

Via

Access Paper or Ask Questions

Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection

Oct 01, 2024

Pengxi Zeng, Alberto Presta, Jonah Reinis, Dinesh Bharadia, Hang Qiu, Pamela Cosman

Figure 1 for Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection

Figure 2 for Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection

Figure 3 for Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection

Figure 4 for Can We Remove the Ground? Obstacle-aware Point Cloud Compression for Remote Object Detection

Abstract:Efficient point cloud (PC) compression is crucial for streaming applications, such as augmented reality and cooperative perception. Classic PC compression techniques encode all the points in a frame. Tailoring compression towards perception tasks at the receiver side, we ask the question, "Can we remove the ground points during transmission without sacrificing the detection performance?" Our study reveals a strong dependency on the ground from state-of-the-art (SOTA) 3D object detection models, especially on those points below and around the object. In this work, we propose a lightweight obstacle-aware Pillar-based Ground Removal (PGR) algorithm. PGR filters out ground points that do not provide context to object recognition, significantly improving compression ratio without sacrificing the receiver side perception performance. Not using heavy object detection or semantic segmentation models, PGR is light-weight, highly parallelizable, and effective. Our evaluations on KITTI and Waymo Open Dataset show that SOTA detection models work equally well with PGR removing 20-30% of the points, with a speeding of 86 FPS.

* 7 Pages; submitted to ICRA 2025

Via

Access Paper or Ask Questions

Domain Adaptation for Learned Image Compression with Supervised Adapters

Apr 24, 2024

Alberto Presta, Gabriele Spadaro, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto

Figure 1 for Domain Adaptation for Learned Image Compression with Supervised Adapters

Figure 2 for Domain Adaptation for Learned Image Compression with Supervised Adapters

Figure 3 for Domain Adaptation for Learned Image Compression with Supervised Adapters

Figure 4 for Domain Adaptation for Learned Image Compression with Supervised Adapters

Abstract:In Learned Image Compression (LIC), a model is trained at encoding and decoding images sampled from a source domain, often outperforming traditional codecs on natural images; yet its performance may be far from optimal on images sampled from different domains. In this work, we tackle the problem of adapting a pre-trained model to multiple target domains by plugging into the decoder an adapter module for each of them, including the source one. Each adapter improves the decoder performance on a specific domain, without the model forgetting about the images seen at training time. A gate network computes the weights to optimally blend the contributions from the adapters when the bitstream is decoded. We experimentally validate our method over two state-of-the-art pre-trained models, observing improved rate-distortion efficiency on the target domains without penalties on the source domain. Furthermore, the gate's ability to find similarities with the learned target domains enables better encoding efficiency also for images outside them.

* 10 pages, published to Data compression conference 2024 (DCC2024)

Via

Access Paper or Ask Questions

Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray

Mar 27, 2024

Guglielmo Gallone, Francesco Iodice, Alberto Presta, Davide Tore, Ovidio de Filippo, Michele Visciano, Carlo Alberto Barbano, Alessandro Serafini, Paola Gorrini, Alessandro Bruno(+9 more)

Figure 1 for Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray

Figure 2 for Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray

Figure 3 for Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray

Figure 4 for Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray

Abstract:Aims. To develop a deep-learning based system for recognition of subclinical atherosclerosis on a plain frontal chest x-ray. Methods and Results. A deep-learning algorithm to predict coronary artery calcium (CAC) score (the AI-CAC model) was developed on 460 chest x-ray (80% training cohort, 20% internal validation cohort) of primary prevention patients (58.4% male, median age 63 [51-74] years) with available paired chest x-ray and chest computed tomography (CT) indicated for any clinical reason and performed within 3 months. The CAC score calculated on chest CT was used as ground truth. The model was validated on an temporally-independent cohort of 90 patients from the same institution (external validation). The diagnostic accuracy of the AI-CAC model assessed by the area under the curve (AUC) was the primary outcome. Overall, median AI-CAC score was 35 (0-388) and 28.9% patients had no AI-CAC. AUC of the AI-CAC model to identify a CAC>0 was 0.90 in the internal validation cohort and 0.77 in the external validation cohort. Sensitivity was consistently above 92% in both cohorts. In the overall cohort (n=540), among patients with AI-CAC=0, a single ASCVD event occurred, after 4.3 years. Patients with AI-CAC>0 had significantly higher Kaplan Meier estimates for ASCVD events (13.5% vs. 3.4%, log-rank=0.013). Conclusion. The AI-CAC model seems to accurately detect subclinical atherosclerosis on chest x-ray with elevated sensitivity, and to predict ASCVD events with elevated negative predictive value. Adoption of the AI-CAC model to refine CV risk stratification or as an opportunistic screening tool requires prospective evaluation.

* Submitted to European Heart Journal - Cardiovascular Imaging Added also the additional material 44 pages (30 main paper, 14 additional material), 14 figures (5 main manuscript, 9 additional material)

Via

Access Paper or Ask Questions