Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Slava Voloshynovskiy

Authentication of Copy Detection Patterns via Cross-Camera Dual-Synthetic Referencing

May 29, 2026

Ivan Oleksiyuk, Roman Chaban, Slava Voloshynovskiy

Abstract:Copy Detection Patterns (CDPs) are structures printed on physical objects to enable cost-effective authentication. Verification is achieved by comparing a captured image with the digital template from which the CDP was printed. In practice, printer stochasticity and camera distortions hinder this comparison, limiting robustness against counterfeiting. Prior work addressed camera effects by synthesising reference images in the verification camera domain, but it ignored printing variability. We introduce an enrolment-based cross-camera dual-synthetic referencing framework. Each printed CDP is first captured by a controlled enrolment camera, and a deep-learning-based translator jointly exploits the digital template and the enrolled capture to generate a high-quality reference for the verification image. We provide an information-theoretic justification showing that the dual reference is more informative than template-based references. Experiments on heterogeneous mobile cameras demonstrate improved authentication performance, robustness to machine-learning-based copy attacks, and reliable verification from small CDP regions and on low-end devices.

* To appear in Proc. ICIP2026, September 13-17, 2026, Tampere, Finland

Via

Access Paper or Ask Questions

Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering

May 21, 2026

Yury Belousov, Brian Pulfer, Vitaliy Kinakh, Slava Voloshynovskiy

Abstract:Vision foundation models are widely used as frozen backbones across many downstream tasks, making them a single point of failure under adversarial attack. We study multi-level Floyd-Steinberg error-diffusion dithering as a lightweight, model-agnostic input transformation that disrupts adversarial perturbations while preserving semantic content. Unlike prior work, which was limited to binary dithering, grayscale CIFAR-10, and a single small model trained from scratch, we evaluate across six tasks (classification, segmentation, depth estimation, retrieval, captioning, visual question answering), two model families (DINOv2, PaliGemma), and three attacks of increasing strength (PGD, MI-FGSM, SIA), as well as an adaptive attacker using a straight-through estimator. Our results show that Floyd-Steinberg dithering at intermediate quantization levels, especially when combined with post-processing blur, exceeds or matches all tested baselines, including diffusion-based denoising, with substantially less degradation on clean inputs.

* Paper accepted at the IEEE International Conference on Image Processing (ICIP 2026)

Via

Access Paper or Ask Questions

Human-Inspired Context-Selective Multimodal Memory for Social Robots

Apr 13, 2026

Hangyeol Kang, Slava Voloshynovskiy, Nadia Magnenat Thalmann

Abstract:Memory is fundamental to social interaction, enabling humans to recall meaningful past experiences and adapt their behavior accordingly based on the context. However, most current social robots and embodied agents rely on non-selective, text-based memory, limiting their ability to support personalized, context-aware interactions. Drawing inspiration from cognitive neuroscience, we propose a context-selective, multimodal memory architecture for social robots that captures and retrieves both textual and visual episodic traces, prioritizing moments characterized by high emotional salience or scene novelty. By associating these memories with individual users, our system enables socially personalized recall and more natural, grounded dialogue. We evaluate the selective storage mechanism using a curated dataset of social scenarios, achieving a Spearman correlation of 0.506, surpassing human consistency ($ρ=0.415$) and outperforming existing image memorability models. In multimodal retrieval experiments, our fusion approach improves Recall@1 by up to 13\% over unimodal text or image retrieval. Runtime evaluations confirm that the system maintains real-time performance. Qualitative analyses further demonstrate that the proposed framework produces richer and more socially relevant responses than baseline models. This work advances memory design for social robots by bridging human-inspired selectivity and multimodal retrieval to enhance long-term, personalized human-robot interaction.

* Proc. of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

Via

Access Paper or Ask Questions

Beyond Classification: Evaluating Diffusion Denoised Smoothing for Security-Utility Trade off

May 21, 2025

Yury Belousov, Brian Pulfer, Vitaliy Kinakh, Slava Voloshynovskiy

Abstract:While foundation models demonstrate impressive performance across various tasks, they remain vulnerable to adversarial inputs. Current research explores various approaches to enhance model robustness, with Diffusion Denoised Smoothing emerging as a particularly promising technique. This method employs a pretrained diffusion model to preprocess inputs before model inference. Yet, its effectiveness remains largely unexplored beyond classification. We aim to address this gap by analyzing three datasets with four distinct downstream tasks under three different adversarial attack algorithms. Our findings reveal that while foundation models maintain resilience against conventional transformations, applying high-noise diffusion denoising to clean images without any distortions significantly degrades performance by as high as 57%. Low-noise diffusion settings preserve performance but fail to provide adequate protection across all attack types. Moreover, we introduce a novel attack strategy specifically targeting the diffusion process itself, capable of circumventing defenses in the low-noise regime. Our results suggest that the trade-off between adversarial robustness and performance remains a challenge to be addressed.

* Paper accepted at the 33rd European Signal Processing Conference (EUSIPCO 2025)

Via

Access Paper or Ask Questions

Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach

Mar 31, 2025

Francesco Pio Ramunno, Paolo Massa, Vitaliy Kinakh, Brandon Panos, André Csillaghy, Slava Voloshynovskiy

Abstract:The spatial properties of the solar magnetic field are crucial to decoding the physical processes in the solar interior and their interplanetary effects. However, observations from older instruments, such as the Michelson Doppler Imager (MDI), have limited spatial or temporal resolution, which hinders the ability to study small-scale solar features in detail. Super resolving these older datasets is essential for uniform analysis across different solar cycles, enabling better characterization of solar flares, active regions, and magnetic network dynamics. In this work, we introduce a novel diffusion model approach for Super-Resolution and we apply it to MDI magnetograms to match the higher-resolution capabilities of the Helioseismic and Magnetic Imager (HMI). By training a Latent Diffusion Model (LDM) with residuals on downscaled HMI data and fine-tuning it with paired MDI/HMI data, we can enhance the resolution of MDI observations from 2"/pixel to 0.5"/pixel. We evaluate the quality of the reconstructed images by means of classical metrics (e.g., PSNR, SSIM, FID and LPIPS) and we check if physical properties, such as the unsigned magnetic flux or the size of an active region, are preserved. We compare our model with different variations of LDM and Denoising Diffusion Probabilistic models (DDPMs), but also with two deterministic architectures already used in the past for performing the Super-Resolution task. Furthermore, we show with an analysis in the Fourier domain that the LDM with residuals can resolve features smaller than 2", and due to the probabilistic nature of the LDM, we can asses their reliability, in contrast with the deterministic models. Future studies aim to super-resolve the temporal scale of the solar MDI instrument so that we can also have a better overview of the dynamics of the old events.

* Accepted for publication on A&A

Via

Access Paper or Ask Questions

Robustness Tokens: Towards Adversarial Robustness of Transformers

Mar 13, 2025

Brian Pulfer, Yury Belousov, Slava Voloshynovskiy

Abstract:Recently, large pre-trained foundation models have become widely adopted by machine learning practitioners for a multitude of tasks. Given that such models are publicly available, relying on their use as backbone models for downstream tasks might result in high vulnerability to adversarial attacks crafted with the same public model. In this work, we propose Robustness Tokens, a novel approach specific to the transformer architecture that fine-tunes a few additional private tokens with low computational requirements instead of tuning model parameters as done in traditional adversarial training. We show that Robustness Tokens make Vision Transformer models significantly more robust to white-box adversarial attacks while also retaining the original downstream performances.

* Computer Vision, ECCV 2024 pp 110 to 127, Springer Nature Switzerland
* This paper has been accepted for publication at the European Conference on Computer Vision (ECCV), 2024

Via

Access Paper or Ask Questions

TwinTURBO: Semi-Supervised Fine-Tuning of Foundation Models via Mutual Information Decompositions for Downstream Task and Latent Spaces

Mar 10, 2025

Guillaume Quétant, Pavlo Molchanov, Slava Voloshynovskiy

Figure 1 for TwinTURBO: Semi-Supervised Fine-Tuning of Foundation Models via Mutual Information Decompositions for Downstream Task and Latent Spaces

Figure 2 for TwinTURBO: Semi-Supervised Fine-Tuning of Foundation Models via Mutual Information Decompositions for Downstream Task and Latent Spaces

Figure 3 for TwinTURBO: Semi-Supervised Fine-Tuning of Foundation Models via Mutual Information Decompositions for Downstream Task and Latent Spaces

Figure 4 for TwinTURBO: Semi-Supervised Fine-Tuning of Foundation Models via Mutual Information Decompositions for Downstream Task and Latent Spaces

Abstract:We present a semi-supervised fine-tuning framework for foundation models that utilises mutual information decomposition to address the challenges of training for a limited amount of labelled data. Our approach derives two distinct lower bounds: i) for the downstream task space, such as classification, optimised using conditional and marginal cross-entropy alongside Kullback-Leibler divergence, and ii) for the latent space representation, regularised and aligned using a contrastive-like decomposition. This fine-tuning strategy retains the pre-trained structure of the foundation model, modifying only a specialised projector module comprising a small transformer and a token aggregation technique. Experiments on several datasets demonstrate significant improvements in classification tasks under extremely low-labelled conditions by effectively leveraging unlabelled data.

Via

Access Paper or Ask Questions

Task-Agnostic Attacks Against Vision Foundation Models

Mar 05, 2025

Brian Pulfer, Yury Belousov, Vitaliy Kinakh, Teddy Furon, Slava Voloshynovskiy

Figure 1 for Task-Agnostic Attacks Against Vision Foundation Models

Figure 2 for Task-Agnostic Attacks Against Vision Foundation Models

Figure 3 for Task-Agnostic Attacks Against Vision Foundation Models

Figure 4 for Task-Agnostic Attacks Against Vision Foundation Models

Abstract:The study of security in machine learning mainly focuses on downstream task-specific attacks, where the adversarial example is obtained by optimizing a loss function specific to the downstream task. At the same time, it has become standard practice for machine learning practitioners to adopt publicly available pre-trained vision foundation models, effectively sharing a common backbone architecture across a multitude of applications such as classification, segmentation, depth estimation, retrieval, question-answering and more. The study of attacks on such foundation models and their impact to multiple downstream tasks remains vastly unexplored. This work proposes a general framework that forges task-agnostic adversarial examples by maximally disrupting the feature representation obtained with foundation models. We extensively evaluate the security of the feature representations obtained by popular vision foundation models by measuring the impact of this attack on multiple downstream tasks and its transferability between models.

Via

Access Paper or Ask Questions

Binary Diffusion Probabilistic Model

Jan 23, 2025

Vitaliy Kinakh, Slava Voloshynovskiy

Figure 1 for Binary Diffusion Probabilistic Model

Figure 2 for Binary Diffusion Probabilistic Model

Figure 3 for Binary Diffusion Probabilistic Model

Figure 4 for Binary Diffusion Probabilistic Model

Abstract:We introduce the Binary Diffusion Probabilistic Model (BDPM), a novel generative model optimized for binary data representations. While denoising diffusion probabilistic models (DDPMs) have demonstrated notable success in tasks like image synthesis and restoration, traditional DDPMs rely on continuous data representations and mean squared error (MSE) loss for training, applying Gaussian noise models that may not be optimal for discrete or binary data structures. BDPM addresses this by decomposing images into bitplanes and employing XOR-based noise transformations, with a denoising model trained using binary cross-entropy loss. This approach enables precise noise control and computationally efficient inference, significantly lowering computational costs and improving model convergence. When evaluated on image restoration tasks such as image super-resolution, inpainting, and blind image restoration, BDPM outperforms state-of-the-art methods on the FFHQ, CelebA, and CelebA-HQ datasets. Notably, BDPM requires fewer inference steps than traditional DDPM models to reach optimal results, showcasing enhanced inference efficiency.

Via

Access Paper or Ask Questions

Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition

Oct 02, 2024

Mariia Drozdova, Vitaliy Kinakh, Yury Belousov, Erica Lastufka, Slava Voloshynovskiy

Figure 1 for Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition

Figure 2 for Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition

Figure 3 for Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition

Figure 4 for Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition

Abstract:In this paper, we present a semi-supervised fine-tuning approach designed to improve the performance of foundation models on downstream tasks with limited labeled data. By leveraging content-style decomposition within an information-theoretic framework, our method enhances the latent representations of pre-trained vision foundation models, aligning them more effectively with specific task objectives and addressing the problem of distribution shift. We evaluate our approach on multiple datasets, including MNIST, its augmented variations (with yellow and white stripes), CIFAR-10, SVHN, and GalaxyMNIST. The experiments show improvements over purely supervised baselines, particularly in low-labeled data regimes, across both frozen and trainable backbones for the majority of the tested datasets.

Via

Access Paper or Ask Questions