Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Asja Fischer

Towards an Optimal Control Perspective of ResNet Training

Jun 26, 2025

Jens Püttschneider, Simon Heilig, Asja Fischer, Timm Faulwasser

Abstract:We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden states corresponding to stage cost terms in optimal control. For standard ResNets, we obtain intermediate outputs by propagating the state through the subsequent skip connections and the output layer. We demonstrate that our training dynamic biases the weights of the unnecessary deeper residual layers to vanish. This indicates the potential for a theory-grounded layer pruning strategy.

* Accepted for presentation at the High-dimensional Learning Dynamics (HiLD) workshop at ICML 2025

Via

Access Paper or Ask Questions

RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors

Jun 09, 2025

Hicham Eddoubi, Jonas Ricker, Federico Cocchi, Lorenzo Baraldi, Angelo Sotgiu, Maura Pintor, Marcella Cornia, Asja Fischer, Rita Cucchiara, Battista Biggio

Abstract:AI-generated images have reached a quality level at which humans are incapable of reliably distinguishing them from real images. To counteract the inherent risk of fraud and disinformation, the detection of AI-generated images is a pressing challenge and an active research topic. While many of the presented methods claim to achieve high detection accuracy, they are usually evaluated under idealized conditions. In particular, the adversarial robustness is often neglected, potentially due to a lack of awareness or the substantial effort required to conduct a comprehensive robustness analysis. In this work, we tackle this problem by providing a simpler means to assess the robustness of AI-generated image detectors. We present RAID (Robust evaluation of AI-generated image Detectors), a dataset of 72k diverse and highly transferable adversarial examples. The dataset is created by running attacks against an ensemble of seven state-of-the-art detectors and images generated by four different text-to-image models. Extensive experiments show that our methodology generates adversarial images that transfer with a high success rate to unseen detectors, which can be used to quickly provide an approximate yet still reliable estimate of a detector's adversarial robustness. Our findings indicate that current state-of-the-art AI-generated image detectors can be easily deceived by adversarial examples, highlighting the critical need for the development of more robust methods. We release our dataset at https://huggingface.co/datasets/aimagelab/RAID and evaluation code at https://github.com/pralab/RAID.

Via

Access Paper or Ask Questions

Security Benefits and Side Effects of Labeling AI-Generated Images

May 28, 2025

Sandra Höltervennhoff, Jonas Ricker, Maike M. Raphael, Charlotte Schwedes, Rebecca Weil, Asja Fischer, Thorsten Holz, Lea Schönherr, Sascha Fahl

Abstract:Generative artificial intelligence is developing rapidly, impacting humans' interaction with information and digital media. It is increasingly used to create deceptively realistic misinformation, so lawmakers have imposed regulations requiring the disclosure of AI-generated content. However, only little is known about whether these labels reduce the risks of AI-generated misinformation. Our work addresses this research gap. Focusing on AI-generated images, we study the implications of labels, including the possibility of mislabeling. Assuming that simplicity, transparency, and trust are likely to impact the successful adoption of such labels, we first qualitatively explore users' opinions and expectations of AI labeling using five focus groups. Second, we conduct a pre-registered online survey with over 1300 U.S. and EU participants to quantitatively assess the effect of AI labels on users' ability to recognize misinformation containing either human-made or AI-generated images. Our focus groups illustrate that, while participants have concerns about the practical implementation of labeling, they consider it helpful in identifying AI-generated images and avoiding deception. However, considering security benefits, our survey revealed an ambiguous picture, suggesting that users might over-rely on labels. While inaccurate claims supported by labeled AI-generated images were rated less credible than those with unlabeled AI-images, the belief in accurate claims also decreased when accompanied by a labeled AI-generated image. Moreover, we find the undesired side effect that human-made images conveying inaccurate claims were perceived as more credible in the presence of labels.

Via

Access Paper or Ask Questions

Towards A Correct Usage of Cryptography in Semantic Watermarks for Diffusion Models

Mar 14, 2025

Jonas Thietke, Andreas Müller, Denis Lukovnikov, Asja Fischer, Erwin Quiring

Abstract:Semantic watermarking methods enable the direct integration of watermarks into the generation process of latent diffusion models by only modifying the initial latent noise. One line of approaches building on Gaussian Shading relies on cryptographic primitives to steer the sampling process of the latent noise. However, we identify several issues in the usage of cryptographic techniques in Gaussian Shading, particularly in its proof of lossless performance and key management, causing ambiguity in follow-up works, too. In this work, we therefore revisit the cryptographic primitives for semantic watermarking. We introduce a novel, general proof of lossless performance based on IND\$-CPA security for semantic watermarks. We then discuss the configuration of the cryptographic primitives in semantic watermarks with respect to security, efficiency, and generation quality.

* 8 pages, 3 figures, WMark@ICLR

Via

Access Paper or Ask Questions

Can LLMs Explain Themselves Counterfactually?

Feb 25, 2025

Zahra Dehghanighobadi, Asja Fischer, Muhammad Bilal Zafar

Abstract:Explanations are an important tool for gaining insights into the behavior of ML models, calibrating user trust and ensuring regulatory compliance. Past few years have seen a flurry of post-hoc methods for generating model explanations, many of which involve computing model gradients or solving specially designed optimization problems. However, owing to the remarkable reasoning abilities of Large Language Model (LLMs), self-explanation, that is, prompting the model to explain its outputs has recently emerged as a new paradigm. In this work, we study a specific type of self-explanations, self-generated counterfactual explanations (SCEs). We design tests for measuring the efficacy of LLMs in generating SCEs. Analysis over various LLM families, model sizes, temperature settings, and datasets reveals that LLMs sometimes struggle to generate SCEs. Even when they do, their prediction often does not agree with their own counterfactual reasoning.

Via

Access Paper or Ask Questions

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Dec 04, 2024

Andreas Müller, Denis Lukovnikov, Jonas Thietke, Asja Fischer, Erwin Quiring

Figure 1 for Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Figure 2 for Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Figure 3 for Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Figure 4 for Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Abstract:Integrating watermarking into the generation process of latent diffusion models (LDMs) simplifies detection and attribution of generated content. Semantic watermarks, such as Tree-Rings and Gaussian Shading, represent a novel class of watermarking techniques that are easy to implement and highly robust against various perturbations. However, our work demonstrates a fundamental security vulnerability of semantic watermarks. We show that attackers can leverage unrelated models, even with different latent spaces and architectures (UNet vs DiT), to perform powerful and realistic forgery attacks. Specifically, we design two watermark forgery attacks. The first imprints a targeted watermark into real images by manipulating the latent representation of an arbitrary image in an unrelated LDM to get closer to the latent representation of a watermarked image. We also show that this technique can be used for watermark removal. The second attack generates new images with the target watermark by inverting a watermarked image and re-generating it with an arbitrary prompt. Both attacks just need a single reference image with the target watermark. Overall, our findings question the applicability of semantic watermarks by revealing that attackers can easily forge or remove these watermarks under realistic conditions.

* 23 pages, 21 figures, 6 tables

Via

Access Paper or Ask Questions

Single-Model Attribution for Spoofed Speech via Vocoder Fingerprints in an Open-World Setting

Nov 21, 2024

Matías Pizarro, Mike Laszkiewicz, Dorothea Kolossa, Asja Fischer

Figure 1 for Single-Model Attribution for Spoofed Speech via Vocoder Fingerprints in an Open-World Setting

Figure 2 for Single-Model Attribution for Spoofed Speech via Vocoder Fingerprints in an Open-World Setting

Figure 3 for Single-Model Attribution for Spoofed Speech via Vocoder Fingerprints in an Open-World Setting

Figure 4 for Single-Model Attribution for Spoofed Speech via Vocoder Fingerprints in an Open-World Setting

Abstract:As speech generation technology advances, so do the potential threats of misusing spoofed speech signals. One way to address these threats is by attributing the signals to their source generative model. In this work, we are the first to tackle the single-model attribution task in an open-world setting, that is, we aim at identifying whether spoofed speech signals from unknown sources originate from a specific vocoder. We show that the standardized average residual between audio signals and their low-pass filtered or EnCodec filtered versions can serve as powerful vocoder fingerprints. The approach only requires data from the target vocoder and allows for simple but highly accurate distance-based model attribution. We demonstrate its effectiveness on LJSpeech and JSUT, achieving an average AUROC of over 99% in most settings. The accompanying robustness study shows that it is also resilient to noise levels up to a certain degree.

Via

Access Paper or Ask Questions

Integrating uncertainty quantification into randomized smoothing based robustness guarantees

Oct 27, 2024

Sina Däubener, Kira Maag, David Krueger, Asja Fischer

Figure 1 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees

Figure 2 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees

Figure 3 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees

Figure 4 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees

Abstract:Deep neural networks have proven to be extremely powerful, however, they are also vulnerable to adversarial attacks which can cause hazardous incorrect predictions in safety-critical applications. Certified robustness via randomized smoothing gives a probabilistic guarantee that the smoothed classifier's predictions will not change within an $\ell_2$-ball around a given input. On the other hand (uncertainty) score-based rejection is a technique often applied in practice to defend models against adversarial attacks. In this work, we fuse these two approaches by integrating a classifier that abstains from predicting when uncertainty is high into the certified robustness framework. This allows us to derive two novel robustness guarantees for uncertainty aware classifiers, namely (i) the radius of an $\ell_2$-ball around the input in which the same label is predicted and uncertainty remains low and (ii) the $\ell_2$-radius of a ball in which the predictions will either not change or be uncertain. While the former provides robustness guarantees with respect to attacks aiming at increased uncertainty, the latter informs about the amount of input perturbation necessary to lead the uncertainty aware model into a wrong prediction. Notably, this is on CIFAR10 up to 20.93% larger than for models not allowing for uncertainty based rejection. We demonstrate, that the novel framework allows for a systematic robustness evaluation of different network architectures and uncertainty measures and to identify desired properties of uncertainty quantification techniques. Moreover, we show that leveraging uncertainty in a smoothed classifier helps out-of-distribution detection.

Via

Access Paper or Ask Questions

Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors

Oct 02, 2024

Sina Mavali, Jonas Ricker, David Pape, Yash Sharma, Asja Fischer, Lea Schoenherr

Figure 1 for Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors

Figure 2 for Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors

Figure 3 for Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors

Figure 4 for Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors

Abstract:While generative AI (GenAI) offers countless possibilities for creative and productive tasks, artificially generated media can be misused for fraud, manipulation, scams, misinformation campaigns, and more. To mitigate the risks associated with maliciously generated media, forensic classifiers are employed to identify AI-generated content. However, current forensic classifiers are often not evaluated in practically relevant scenarios, such as the presence of an attacker or when real-world artifacts like social media degradations affect images. In this paper, we evaluate state-of-the-art AI-generated image (AIGI) detectors under different attack scenarios. We demonstrate that forensic classifiers can be effectively attacked in realistic settings, even when the attacker does not have access to the target model and post-processing occurs after the adversarial examples are created, which is standard on social media platforms. These attacks can significantly reduce detection accuracy to the extent that the risks of relying on detectors outweigh their benefits. Finally, we propose a simple defense mechanism to make CLIP-based detectors, which are currently the best-performing detectors, robust against these attacks.

Via

Access Paper or Ask Questions

Reassessing Noise Augmentation Methods in the Context of Adversarial Speech

Sep 03, 2024

Karla Pizzi, Matías P. Pizarro B, Asja Fischer

Abstract:In this study, we investigate if noise-augmented training can concurrently improve adversarial robustness in automatic speech recognition (ASR) systems. We conduct a comparative analysis of the adversarial robustness of four different state-of-the-art ASR architectures, where each of the ASR architectures is trained under three different augmentation conditions: one subject to background noise, speed variations, and reverberations, another subject to speed variations only, and a third without any form of data augmentation. The results demonstrate that noise augmentation not only improves model performance on noisy speech but also the model's robustness to adversarial attacks.

Via

Access Paper or Ask Questions