Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gabriel Y. Arteaga

Suppressing Non-Semantic Noise in Masked Image Modeling Representations

Mar 31, 2026

Martine Hjelkrem-Tan, Marius Aasan, Rwiddhi Chakraborty, Gabriel Y. Arteaga, Changkyu Choi, Adín Ramírez Rivera

Abstract:Masked Image Modeling (MIM) has become a ubiquitous self-supervised vision paradigm. In this work, we show that MIM objectives cause the learned representations to retain non-semantic information, which ultimately hurts performance during inference. We introduce a model-agnostic score for semantic invariance using Principal Component Analysis (PCA) on real and synthetic non-semantic images. Based on this score, we propose a simple method, Semantically Orthogonal Artifact Projection (SOAP), to directly suppress non-semantic information in patch representations, leading to consistent improvements in zero-shot performance across various MIM-based models. SOAP is a post-hoc suppression method, requires zero training, and can be attached to any model as a single linear head.

* Published in CVPR 2026

Via

Access Paper or Ask Questions

SPoT: Subpixel Placement of Tokens in Vision Transformers

Jul 02, 2025

Martine Hjelkrem-Tan, Marius Aasan, Gabriel Y. Arteaga, Adín Ramírez Rivera

Figure 1 for SPoT: Subpixel Placement of Tokens in Vision Transformers

Figure 2 for SPoT: Subpixel Placement of Tokens in Vision Transformers

Figure 3 for SPoT: Subpixel Placement of Tokens in Vision Transformers

Figure 4 for SPoT: Subpixel Placement of Tokens in Vision Transformers

Abstract:Vision Transformers naturally accommodate sparsity, yet standard tokenization methods confine features to discrete patch grids. This constraint prevents models from fully exploiting sparse regimes, forcing awkward compromises. We propose Subpixel Placement of Tokens (SPoT), a novel tokenization strategy that positions tokens continuously within images, effectively sidestepping grid-based limitations. With our proposed oracle-guided search, we uncover substantial performance gains achievable with ideal subpixel token positioning, drastically reducing the number of tokens necessary for accurate predictions during inference. SPoT provides a new direction for flexible, efficient, and interpretable ViT architectures, redefining sparsity as a strategic advantage rather than an imposed limitation.

* To appear in Workshop on Efficient Computing under Limited Resources: Visual Computing (ICCV 2025). Code available at https://github.com/dsb-ifi/SPoT

Via

Access Paper or Ask Questions

Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models

Sep 04, 2024

Gabriel Y. Arteaga, Thomas B. Schön, Nicolas Pielawski

Figure 1 for Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models

Figure 2 for Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models

Figure 3 for Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models

Figure 4 for Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models

Abstract:Uncertainty estimation is a necessary component when implementing AI in high-risk settings, such as autonomous cars, medicine, or insurances. Large Language Models (LLMs) have seen a surge in popularity in recent years, but they are subject to hallucinations, which may cause serious harm in high-risk settings. Despite their success, LLMs are expensive to train and run: they need a large amount of computations and memory, preventing the use of ensembling methods in practice. In this work, we present a novel method that allows for fast and memory-friendly training of LLM ensembles. We show that the resulting ensembles can detect hallucinations and are a viable approach in practice as only one GPU is needed for training and inference.

* 5 pages, 3 figures

Via

Access Paper or Ask Questions