Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sven Lončarić

BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks

Jun 01, 2026

Ivan Sabolić, Marin Oršić, Josip Šarić, Sven Lončarić

Abstract:Supervised fine-tuning is the predominant approach for adapting autoregressive vision-language models to downstream tasks. Recent work has shown that this paradigm is highly vulnerable to backdoor attacks, and that existing defenses are ineffective in open-ended generation settings. In response, we propose BYORn, a backdoor-robust fine-tuning framework motivated by the observation that poisoned target responses are often semantically implausible given the corresponding image-text inputs and a pretrained model. BYORn identifies such misaligned responses and dynamically replaces them with alternative responses generated by the model, thereby breaking the correlation between triggers and target outputs. The resulting objective gradient corresponds to the gradient of the empirical estimate of the population risk upper bound over the clean data distribution. Empirically, BYORn consistently improves robustness to backdoor attacks while preserving clean-task performance, establishing a new trade-off frontier between generalization and attack success rate. Finally, we demonstrate that BYORn remains effective against adaptive attacks specifically designed to circumvent the proposed defense.

* Accepted to ICML 2026

Via

Access Paper or Ask Questions

Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting

May 13, 2026

Lovre Antonio Budimir, Yushi Guan, Steve Ryhner, Sven Lončarić, Nandita Vijaykumar

Abstract:3D Language Gaussian Splatting (3DLGS) augments 3D Gaussian Splatting with language-aligned visual features for open-vocabulary 3D scene understanding. A core challenge is efficiently associating high-dimensional vision-language embeddings with millions of 3D Gaussians while preserving efficient feature rendering for text-based querying. Existing methods either store dense features directly on Gaussians, causing high storage costs and slow rendering, or learn compact representations through expensive per-scene optimization with repeated feature rasterization. No existing method simultaneously achieves fast 3D semantic reconstruction, efficient storage, and fast rendering. We propose SCOUP (Sparse COde UPlifting), which addresses all three by decoupling language representation learning from 3D Gaussian optimization. Rather than working directly in 3D, we learn sparse codebook-based representations entirely using features associated with 2D image regions, associating each region with a sparse set of codebook coefficients. We then uplift these coefficients to 3D Gaussians with our weighted sparse aggregation using Gaussian-to-pixel associations, where each Gaussian accumulates coefficients over codebook atoms across views. Top-$K$ filtering then extracts the most dominant multi-view coefficients per Gaussian, enabling efficient storage and fast rendering. Our method achieves up to $400\times$ training speedup while being $3\times$ more memory efficient during training compared to the state-of-the-art in rendering speed. Across multiple benchmarks, SCOUP matches or outperforms existing methods in open-vocabulary querying accuracy.

* 18 pages (9 pages main paper), 10 figures, preprint

Via

Access Paper or Ask Questions

Improving Heart Rejection Detection in XPCI Images Using Synthetic Data Augmentation

May 26, 2025

Jakov Samardžija, Donik Vršnak, Sven Lončarić

Abstract:Accurate identification of acute cellular rejection (ACR) in endomyocardial biopsies is essential for effective management of heart transplant patients. However, the rarity of high-grade rejection cases (3R) presents a significant challenge for training robust deep learning models. This work addresses the class imbalance problem by leveraging synthetic data generation using StyleGAN to augment the limited number of real 3R images. Prior to GAN training, histogram equalization was applied to standardize image appearance and improve the consistency of tissue representation. StyleGAN was trained on available 3R biopsy patches and subsequently used to generate 10,000 realistic synthetic images. These were combined with real 0R samples, that is samples without rejection, in various configurations to train ResNet-18 classifiers for binary rejection classification. Three classifier variants were evaluated: one trained on real 0R and synthetic 3R images, another using both synthetic and additional real samples, and a third trained solely on real data. All models were tested on an independent set of real biopsy images. Results demonstrate that synthetic data improves classification performance, particularly when used in combination with real samples. The highest-performing model, which used both real and synthetic images, achieved strong precision and recall for both classes. These findings underscore the value of hybrid training strategies and highlight the potential of GAN-based data augmentation in biomedical image analysis, especially in domains constrained by limited annotated datasets.

Via

Access Paper or Ask Questions

DHECA-SuperGaze: Dual Head-Eye Cross-Attention and Super-Resolution for Unconstrained Gaze Estimation

May 13, 2025

Franko Šikić, Donik Vršnak, Sven Lončarić

Abstract:Unconstrained gaze estimation is the process of determining where a subject is directing their visual attention in uncontrolled environments. Gaze estimation systems are important for a myriad of tasks such as driver distraction monitoring, exam proctoring, accessibility features in modern software, etc. However, these systems face challenges in real-world scenarios, partially due to the low resolution of in-the-wild images and partially due to insufficient modeling of head-eye interactions in current state-of-the-art (SOTA) methods. This paper introduces DHECA-SuperGaze, a deep learning-based method that advances gaze prediction through super-resolution (SR) and a dual head-eye cross-attention (DHECA) module. Our dual-branch convolutional backbone processes eye and multiscale SR head images, while the proposed DHECA module enables bidirectional feature refinement between the extracted visual features through cross-attention mechanisms. Furthermore, we identified critical annotation errors in one of the most diverse and widely used gaze estimation datasets, Gaze360, and rectified the mislabeled data. Performance evaluation on Gaze360 and GFIE datasets demonstrates superior within-dataset performance of the proposed method, reducing angular error (AE) by 0.48{\deg} (Gaze360) and 2.95{\deg} (GFIE) in static configurations, and 0.59{\deg} (Gaze360) and 3.00{\deg} (GFIE) in temporal settings compared to prior SOTA methods. Cross-dataset testing shows improvements in AE of more than 1.53{\deg} (Gaze360) and 3.99{\deg} (GFIE) in both static and temporal settings, validating the robust generalization properties of our approach.

Via

Access Paper or Ask Questions

A Survey on Deep Learning-based Gaze Direction Regression: Searching for the State-of-the-art

Oct 22, 2024

Franko Šikić, Donik Vršnak, Sven Lončarić

Abstract:In this paper, we present a survey of deep learning-based methods for the regression of gaze direction vector from head and eye images. We describe in detail numerous published methods with a focus on the input data, architecture of the model, and loss function used to supervise the model. Additionally, we present a list of datasets that can be used to train and evaluate gaze direction regression methods. Furthermore, we noticed that the results reported in the literature are often not comparable one to another due to differences in the validation or even test subsets used. To address this problem, we re-evaluated several methods on the commonly used in-the-wild Gaze360 dataset using the same validation setup. The experimental results show that the latest methods, although claiming state-of-the-art results, significantly underperform compared with some older methods. Finally, we show that the temporal models outperform the static models under static test conditions.

* Accepted on SPRA 2024 (Istanbul, Turkey)

Via

Access Paper or Ask Questions

Illumination Estimation Challenge: experience of past two years

Dec 31, 2020

Egor Ershov, Alex Savchik, Ilya Semenkov, Nikola Banić, Karlo Koscević, Marko Subašić, Alexander Belokopytov, Zhihao Li, Arseniy Terekhin, Daria Senshina(+8 more)

Figure 1 for Illumination Estimation Challenge: experience of past two years

Figure 2 for Illumination Estimation Challenge: experience of past two years

Figure 3 for Illumination Estimation Challenge: experience of past two years

Figure 4 for Illumination Estimation Challenge: experience of past two years

Abstract:Illumination estimation is the essential step of computational color constancy, one of the core parts of various image processing pipelines of modern digital cameras. Having an accurate and reliable illumination estimation is important for reducing the illumination influence on the image colors. To motivate the generation of new ideas and the development of new algorithms in this field, the 2nd Illumination estimation challenge~(IEC\#2) was conducted. The main advantage of testing a method on a challenge over testing in on some of the known datasets is the fact that the ground-truth illuminations for the challenge test images are unknown up until the results have been submitted, which prevents any potential hyperparameter tuning that may be biased. The challenge had several tracks: general, indoor, and two-illuminant with each of them focusing on different parameters of the scenes. Other main features of it are a new large dataset of images (about 5000) taken with the same camera sensor model, a manual markup accompanying each image, diverse content with scenes taken in numerous countries under a huge variety of illuminations extracted by using the SpyderCube calibration object, and a contest-like markup for the images from the Cube+ dataset that was used in IEC\#1. This paper focuses on the description of the past two challenges, algorithms which won in each track, and the conclusions that were drawn based on the results obtained during the 1st and 2nd challenge that can be useful for similar future developments.

Via

Access Paper or Ask Questions

The Cube++ Illumination Estimation Dataset

Nov 19, 2020

Egor Ershov, Alex Savchik, Illya Semenkov, Nikola Banić, Alexander Belokopytov, Daria Senshina, Karlo Koscević, Marko Subašić, Sven Lončarić

Figure 1 for The Cube++ Illumination Estimation Dataset

Figure 2 for The Cube++ Illumination Estimation Dataset

Figure 3 for The Cube++ Illumination Estimation Dataset

Figure 4 for The Cube++ Illumination Estimation Dataset

Abstract:Computational color constancy has the important task of reducing the influence of the scene illumination on the object colors. As such, it is an essential part of the image processing pipelines of most digital cameras. One of the important parts of the computational color constancy is illumination estimation, i.e. estimating the illumination color. When an illumination estimation method is proposed, its accuracy is usually reported by providing the values of error metrics obtained on the images of publicly available datasets. However, over time it has been shown that many of these datasets have problems such as too few images, inappropriate image quality, lack of scene diversity, absence of version tracking, violation of various assumptions, GDPR regulation violation, lack of additional shooting procedure info, etc. In this paper, a new illumination estimation dataset is proposed that aims to alleviate many of the mentioned problems and to help the illumination estimation research. It consists of 4890 images with known illumination colors as well as with additional semantic data that can further make the learning process more accurate. Due to the usage of the SpyderCube color target, for every image there are two ground-truth illumination records covering different directions. Because of that, the dataset can be used for training and testing of methods that perform single or two-illuminant estimation. This makes it superior to many similar existing datasets. The datasets, it's smaller version SimpleCube++, and the accompanying code are available at https://github.com/Visillect/CubePlusPlus/.

Via

Access Paper or Ask Questions

Computational analysis of laminar structure of the human cortex based on local neuron features

May 03, 2019

Andrija Štajduhar, Tomislav Lipić, Goran Sedmak, Sven Lončarić, Miloš Judaš

Figure 1 for Computational analysis of laminar structure of the human cortex based on local neuron features

Figure 2 for Computational analysis of laminar structure of the human cortex based on local neuron features

Figure 3 for Computational analysis of laminar structure of the human cortex based on local neuron features

Figure 4 for Computational analysis of laminar structure of the human cortex based on local neuron features

Abstract:In this paper, we present a novel method for analysis and segmentation of laminar structure of the cortex based on tissue characteristics whose change across the gray matter facilitates distinction between cortical layers. We develop and analyze features of individual neurons to investigate changes in architectonic differentiation and present a novel high-performance, automated tree-ensemble method trained on data manually labeled by three human investigators. From the location and basic measures of neurons, more complex features are developed and used in machine learning models for automatic segmentation of cortical layers. Tree ensembles are used on data manually labeled by three human experts. The most accurate classification results were obtained by training three models separately and creating another ensemble by combining probability outputs for final neuron layer classification. Measurement of importances of developed neuron features on both global model level and individual prediction level are obtained.

Via

Access Paper or Ask Questions

CroP: Color Constancy Benchmark Dataset Generator

Mar 29, 2019

Nikola Banić, Karlo Koščević, Marko Subašić, Sven Lončarić

Figure 1 for CroP: Color Constancy Benchmark Dataset Generator

Figure 2 for CroP: Color Constancy Benchmark Dataset Generator

Figure 3 for CroP: Color Constancy Benchmark Dataset Generator

Figure 4 for CroP: Color Constancy Benchmark Dataset Generator

Abstract:Implementing color constancy as a pre-processing step in contemporary digital cameras is of significant importance as it removes the influence of scene illumination on object colors. Several benchmark color constancy datasets have been created for the purpose of developing and testing new color constancy methods. However, they all have numerous drawbacks including a small number of images, erroneously extracted ground-truth illuminations, long histories of misuses, violations of their stated assumptions, etc. To overcome such and similar problems, in this paper a color constancy benchmark dataset generator is proposed. For a given camera sensor it enables generation of any number of realistic raw images taken in a subset of the real world, namely images of printed photographs. Datasets with such images share many positive features with other existing real-world datasets, while some of the negative features are completely eliminated. The generated images can be successfully used to train methods that afterward achieve high accuracy on real-world datasets. This opens the way for creating large enough datasets for advanced deep learning techniques. Experimental results are presented and discussed. The source code is available at http://www.fer.unizg.hr/ipg/resources/color_constancy/.

* 10 pages, 8 figures

Via

Access Paper or Ask Questions

Automatic Detection of Neurons in NeuN-stained Histological Images of Human Brain

Jun 01, 2018

Andrija Štajduhar, Domagoj Džaja, Miloš Judaš, Sven Lončarić

Figure 1 for Automatic Detection of Neurons in NeuN-stained Histological Images of Human Brain

Figure 2 for Automatic Detection of Neurons in NeuN-stained Histological Images of Human Brain

Figure 3 for Automatic Detection of Neurons in NeuN-stained Histological Images of Human Brain

Figure 4 for Automatic Detection of Neurons in NeuN-stained Histological Images of Human Brain

Abstract:In this paper, we present a novel use of an anisotropic diffusion model for automatic detection of neurons in histological sections of the adult human brain cortex. We use a partial differential equation model to process high resolution images to acquire locations of neuronal bodies. We also present a novel approach in model training and evaluation that considers variability among the human experts, addressing the issue of existence and correctness of the golden standard for neuron and cell counting, used in most of relevant papers. Our method, trained on dataset manually labeled by three experts, has correctly distinguished over 95% of neuron bodies in test data, doing so in time much shorter than other comparable methods.

Via

Access Paper or Ask Questions