Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Florian Scheidegger

Q-SAM2: Accurate Quantization for Segment Anything Model 2

Jun 11, 2025

Nicola Farronato, Florian Scheidegger, Mattia Rigotti, Cristiano Malossi, Michele Magno, Haotong Qin

Abstract:The Segment Anything Model 2 (SAM2) has gained significant attention as a foundational approach for promptable image and video segmentation. However, its expensive computational and memory consumption poses a severe challenge for its application in resource-constrained scenarios. In this paper, we propose an accurate low-bit quantization method for efficient SAM2, termed Q-SAM2. To address the performance degradation caused by the singularities in weight and activation distributions during quantization, Q-SAM2 introduces two novel technical contributions. We first introduce a linear layer calibration method for low-bit initialization of SAM2, which minimizes the Frobenius norm over a small image batch to reposition weight distributions for improved quantization. We then propose a Quantization-Aware Training (QAT) pipeline that applies clipping to suppress outliers and allows the network to adapt to quantization thresholds during training. Our comprehensive experiments demonstrate that Q-SAM2 allows for highly accurate inference while substantially improving efficiency. Both quantitative and visual results show that our Q-SAM2 surpasses existing state-of-the-art general quantization schemes, especially for ultra-low 2-bit quantization. While designed for quantization-aware training, our proposed calibration technique also proves effective in post-training quantization, achieving up to a 66% mIoU accuracy improvement over non-calibrated models.

* 20 pages

Via

Access Paper or Ask Questions

VP Lab: a PEFT-Enabled Visual Prompting Laboratory for Semantic Segmentation

May 21, 2025

Niccolo Avogaro, Thomas Frick, Yagmur G. Cinar, Daniel Caraballo, Cezary Skura, Filip M. Janicki, Piotr Kluska, Brown Ebouky, Nicola Farronato, Florian Scheidegger(+5 more)

Abstract:Large-scale pretrained vision backbones have transformed computer vision by providing powerful feature extractors that enable various downstream tasks, including training-free approaches like visual prompting for semantic segmentation. Despite their success in generic scenarios, these models often fall short when applied to specialized technical domains where the visual features differ significantly from their training distribution. To bridge this gap, we introduce VP Lab, a comprehensive iterative framework that enhances visual prompting for robust segmentation model development. At the core of VP Lab lies E-PEFT, a novel ensemble of parameter-efficient fine-tuning techniques specifically designed to adapt our visual prompting pipeline to specific domains in a manner that is both parameter- and data-efficient. Our approach not only surpasses the state-of-the-art in parameter-efficient fine-tuning for the Segment Anything Model (SAM), but also facilitates an interactive, near-real-time loop, allowing users to observe progressively improving results as they experiment within the framework. By integrating E-PEFT with visual prompting, we demonstrate a remarkable 50\% increase in semantic segmentation mIoU performance across various technical datasets using only 5 validated images, establishing a new paradigm for fast, efficient, and interactive model deployment in new, challenging domains. This work comes in the form of a demonstration.

Via

Access Paper or Ask Questions

Counterfactual Image Generation for adversarially robust and interpretable Classifiers

Oct 01, 2023

Rafael Bischof, Florian Scheidegger, Michael A. Kraus, A. Cristiano I. Malossi

Figure 1 for Counterfactual Image Generation for adversarially robust and interpretable Classifiers

Figure 2 for Counterfactual Image Generation for adversarially robust and interpretable Classifiers

Figure 3 for Counterfactual Image Generation for adversarially robust and interpretable Classifiers

Figure 4 for Counterfactual Image Generation for adversarially robust and interpretable Classifiers

Abstract:Neural Image Classifiers are effective but inherently hard to interpret and susceptible to adversarial attacks. Solutions to both problems exist, among others, in the form of counterfactual examples generation to enhance explainability or adversarially augment training datasets for improved robustness. However, existing methods exclusively address only one of the issues. We propose a unified framework leveraging image-to-image translation Generative Adversarial Networks (GANs) to produce counterfactual samples that highlight salient regions for interpretability and act as adversarial samples to augment the dataset for more robustness. This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake". We assess the method's effectiveness by evaluating (i) the produced explainability masks on a semantic segmentation task for concrete cracks and (ii) the model's resilience against the Projected Gradient Descent (PGD) attack on a fruit defects detection problem. Our produced saliency maps are highly descriptive, achieving competitive IoU values compared to classical segmentation models despite being trained exclusively on classification labels. Furthermore, the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.

Via

Access Paper or Ask Questions

MAEDAY: MAE for few and zero shot AnomalY-Detection

Nov 25, 2022

Eli Schwartz, Assaf Arbelle, Leonid Karlinsky, Sivan Harary, Florian Scheidegger, Sivan Doveh, Raja Giryes

Figure 1 for MAEDAY: MAE for few and zero shot AnomalY-Detection

Figure 2 for MAEDAY: MAE for few and zero shot AnomalY-Detection

Figure 3 for MAEDAY: MAE for few and zero shot AnomalY-Detection

Figure 4 for MAEDAY: MAE for few and zero shot AnomalY-Detection

Abstract:The goal of Anomaly-Detection (AD) is to identify outliers, or outlying regions, from some unknown distribution given only a set of positive (good) examples. Few-Shot AD (FSAD) aims to solve the same task with a minimal amount of normal examples. Recent embedding-based methods, that compare the embedding vectors of queries to a set of reference embeddings, have demonstrated impressive results for FSAD, where as little as one good example is provided. A different approach, image-reconstruction-based, has been historically used for AD. The idea is to train a model to recover normal images from corrupted observations, assuming that the model will fail to recover regions when encountered with an out-of-distribution image. However, image-reconstruction-based methods were not yet used in the low-shot regime as they need to be trained on a diverse set of normal images in order to properly perform. We suggest using Masked Auto-Encoder (MAE), a self-supervised transformer model trained for recovering missing image regions based on their surroundings for FSAD. We show that MAE performs well by pre-training on an arbitrary set of natural images (ImageNet) and only fine-tuning on a small set of normal images. We name this method MAEDAY. We further find that MAEDAY provides an orthogonal signal to the embedding-based methods and the ensemble of the two approaches achieves very strong SOTA results. We also present a novel task of Zero-Shot AD (ZSAD) where no normal samples are available at training time. We show that MAEDAY performs surprisingly well at this task. Finally, we provide a new dataset for detecting foreign objects on the ground and demonstrate superior results for this task as well. Code is available at https://github.com/EliSchwartz/MAEDAY .

Via

Access Paper or Ask Questions

Generating Efficient DNN-Ensembles with Evolutionary Computation

Sep 18, 2020

Marc Ortiz, Florian Scheidegger, Marc Casas, Cristiano Malossi, Eduard Ayguadé

Figure 1 for Generating Efficient DNN-Ensembles with Evolutionary Computation

Figure 2 for Generating Efficient DNN-Ensembles with Evolutionary Computation

Figure 3 for Generating Efficient DNN-Ensembles with Evolutionary Computation

Figure 4 for Generating Efficient DNN-Ensembles with Evolutionary Computation

Abstract:In this work, we leverage ensemble learning as a tool for the creation of faster, smaller, and more accurate deep learning models. We demonstrate that we can jointly optimize for accuracy, inference time, and the number of parameters by combining DNN classifiers. To achieve this, we combine multiple ensemble strategies: bagging, boosting, and an ordered chain of classifiers. To reduce the number of DNN ensemble evaluations during the search, we propose EARN, an evolutionary approach that optimizes the ensemble according to three objectives regarding the constraints specified by the user. We run EARN on 10 image classification datasets with an initial pool of 32 state-of-the-art DCNN on both CPU and GPU platforms, and we generate models with speedups up to $7.60\times$, reductions of parameters by $10\times$, or increases in accuracy up to $6.01\%$ regarding the best DNN in the pool. In addition, our method generates models that are $5.6\times$ faster than the state-of-the-art methods for automatic model generation.

* 8 pages

Via

Access Paper or Ask Questions

Constrained deep neural network architecture search for IoT devices accounting hardware calibration

Sep 24, 2019

Florian Scheidegger, Luca Benini, Costas Bekas, Cristiano Malossi

Figure 1 for Constrained deep neural network architecture search for IoT devices accounting hardware calibration

Figure 2 for Constrained deep neural network architecture search for IoT devices accounting hardware calibration

Figure 3 for Constrained deep neural network architecture search for IoT devices accounting hardware calibration

Figure 4 for Constrained deep neural network architecture search for IoT devices accounting hardware calibration

Abstract:Deep neural networks achieve outstanding results in challenging image classification tasks. However, the design of network topologies is a complex task and the research community makes a constant effort in discovering top-accuracy topologies, either manually or employing expensive architecture searches. In this work, we propose a unique narrow-space architecture search that focuses on delivering low-cost and fast executing networks that respect strict memory and time requirements typical of Internet-of-Things (IoT) near-sensor computing platforms. Our approach provides solutions with classification latencies below 10ms running on a $35 device with 1GB RAM and 5.6GFLOPS peak performance. The narrow-space search of floating-point models improves the accuracy on CIFAR10 of an established IoT model from 70.64% to 74.87% respecting the same memory constraints. We further improve the accuracy to 82.07% by including 16-bit half types and we obtain the best accuracy of 83.45% by extending the search with model optimized IEEE 754 reduced types. To the best of our knowledge, we are the first that empirically demonstrate on over 3000 trained models that running with reduced precision pushes the Pareto optimal front by a wide margin. Under a given memory constraint, accuracy is improved by over 7% points for half and over 1% points further for running with the best model individual format.

Via

Access Paper or Ask Questions

NeuNetS: An Automated Synthesis Engine for Neural Network Design

Jan 17, 2019

Atin Sood, Benjamin Elder, Benjamin Herta, Chao Xue, Costas Bekas, A. Cristiano I. Malossi, Debashish Saha, Florian Scheidegger, Ganesh Venkataraman, Gegi Thomas(+10 more)

Figure 1 for NeuNetS: An Automated Synthesis Engine for Neural Network Design

Figure 2 for NeuNetS: An Automated Synthesis Engine for Neural Network Design

Figure 3 for NeuNetS: An Automated Synthesis Engine for Neural Network Design

Figure 4 for NeuNetS: An Automated Synthesis Engine for Neural Network Design

Abstract:Application of neural networks to a vast variety of practical applications is transforming the way AI is applied in practice. Pre-trained neural network models available through APIs or capability to custom train pre-built neural network architectures with customer data has made the consumption of AI by developers much simpler and resulted in broad adoption of these complex AI models. While prebuilt network models exist for certain scenarios, to try and meet the constraints that are unique to each application, AI teams need to think about developing custom neural network architectures that can meet the tradeoff between accuracy and memory footprint to achieve the tight constraints of their unique use-cases. However, only a small proportion of data science teams have the skills and experience needed to create a neural network from scratch, and the demand far exceeds the supply. In this paper, we present NeuNetS : An automated Neural Network Synthesis engine for custom neural network design that is available as part of IBM's AI OpenScale's product. NeuNetS is available for both Text and Image domains and can build neural networks for specific tasks in a fraction of the time it takes today with human effort, and with accuracy similar to that of human-designed AI models.

* 14 pages, 12 figures. arXiv admin note: text overlap with arXiv:1806.00250

Via

Access Paper or Ask Questions

BAGAN: Data Augmentation with Balancing GAN

Jun 05, 2018

Giovanni Mariani, Florian Scheidegger, Roxana Istrate, Costas Bekas, Cristiano Malossi

Figure 1 for BAGAN: Data Augmentation with Balancing GAN

Figure 2 for BAGAN: Data Augmentation with Balancing GAN

Figure 3 for BAGAN: Data Augmentation with Balancing GAN

Figure 4 for BAGAN: Data Augmentation with Balancing GAN

Abstract:Image classification datasets are often imbalanced, characteristic that negatively affects the accuracy of deep-learning classifiers. In this work we propose balancing GAN (BAGAN) as an augmentation tool to restore balance in imbalanced datasets. This is challenging because the few minority-class images may not be enough to train a GAN. We overcome this issue by including during the adversarial training all available images of majority and minority classes. The generative model learns useful features from majority classes and uses these to generate images for minority classes. We apply class conditioning in the latent space to drive the generation process towards a target class. The generator in the GAN is initialized with the encoder module of an autoencoder that enables us to learn an accurate class-conditioning in the latent space. We compare the proposed methodology with state-of-the-art GANs and demonstrate that BAGAN generates images of superior quality when trained with an imbalanced dataset.

Via

Access Paper or Ask Questions

Efficient Image Dataset Classification Difficulty Estimation for Predicting Deep-Learning Accuracy

Mar 26, 2018

Florian Scheidegger, Roxana Istrate, Giovanni Mariani, Luca Benini, Costas Bekas, Cristiano Malossi

Figure 1 for Efficient Image Dataset Classification Difficulty Estimation for Predicting Deep-Learning Accuracy

Figure 2 for Efficient Image Dataset Classification Difficulty Estimation for Predicting Deep-Learning Accuracy

Figure 3 for Efficient Image Dataset Classification Difficulty Estimation for Predicting Deep-Learning Accuracy

Figure 4 for Efficient Image Dataset Classification Difficulty Estimation for Predicting Deep-Learning Accuracy

Abstract:In the deep-learning community new algorithms are published at an incredible pace. Therefore, solving an image classification problem for new datasets becomes a challenging task, as it requires to re-evaluate published algorithms and their different configurations in order to find a close to optimal classifier. To facilitate this process, before biasing our decision towards a class of neural networks or running an expensive search over the network space, we propose to estimate the classification difficulty of the dataset. Our method computes a single number that characterizes the dataset difficulty 27x faster than training state-of-the-art networks. The proposed method can be used in combination with network topology and hyper-parameter search optimizers to efficiently drive the search towards promising neural-network configurations.

Via

Access Paper or Ask Questions