Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mirko Casu

Deepfake Forensic Analysis: Source Dataset Attribution and Legal Implications of Synthetic Media Manipulation

May 16, 2025

Massimiliano Cassia, Luca Guarnera, Mirko Casu, Ignazio Zangara, Sebastiano Battiato

Abstract:Synthetic media generated by Generative Adversarial Networks (GANs) pose significant challenges in verifying authenticity and tracing dataset origins, raising critical concerns in copyright enforcement, privacy protection, and legal compliance. This paper introduces a novel forensic framework for identifying the training dataset (e.g., CelebA or FFHQ) of GAN-generated images through interpretable feature analysis. By integrating spectral transforms (Fourier/DCT), color distribution metrics, and local feature descriptors (SIFT), our pipeline extracts discriminative statistical signatures embedded in synthetic outputs. Supervised classifiers (Random Forest, SVM, XGBoost) achieve 98-99% accuracy in binary classification (real vs. synthetic) and multi-class dataset attribution across diverse GAN architectures (StyleGAN, AttGAN, GDWCT, StarGAN, and StyleGAN2). Experimental results highlight the dominance of frequency-domain features (DCT/FFT) in capturing dataset-specific artifacts, such as upsampling patterns and spectral irregularities, while color histograms reveal implicit regularization strategies in GAN training. We further examine legal and ethical implications, showing how dataset attribution can address copyright infringement, unauthorized use of personal data, and regulatory compliance under frameworks like GDPR and California's AB 602. Our framework advances accountability and governance in generative modeling, with applications in digital forensics, content moderation, and intellectual property litigation.

Via

Access Paper or Ask Questions

WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution

Apr 29, 2025

Pietro Bongini, Sara Mandelli, Andrea Montibeller, Mirko Casu, Orazio Pontorno, Claudio Vittorio Ragaglia, Luca Zanchetta, Mattia Aquilina, Taiba Majid Wani, Luca Guarnera(+7 more)

Abstract:Synthetic image source attribution is an open challenge, with an increasing number of image generators being released yearly. The complexity and the sheer number of available generative techniques, as well as the scarcity of high-quality open source datasets of diverse nature for this task, make training and benchmarking synthetic image source attribution models very challenging. WILD is a new in-the-Wild Image Linkage Dataset designed to provide a powerful training and benchmarking tool for synthetic image attribution models. The dataset is built out of a closed set of 10 popular commercial generators, which constitutes the training base of attribution models, and an open set of 10 additional generators, simulating a real-world in-the-wild scenario. Each generator is represented by 1,000 images, for a total of 10,000 images in the closed set and 10,000 images in the open set. Half of the images are post-processed with a wide range of operators. WILD allows benchmarking attribution models in a wide range of tasks, including closed and open set identification and verification, and robust attribution with respect to post-processing and adversarial attacks. Models trained on WILD are expected to benefit from the challenging scenario represented by the dataset itself. Moreover, an assessment of seven baseline methodologies on closed and open set attribution is presented, including robustness tests with respect to post-processing.

Via

Access Paper or Ask Questions

AI Mirage: The Impostor Bias and the Deepfake Detection Challenge in the Era of Artificial Illusions

Dec 24, 2023

Mirko Casu, Luca Guarnera, Pasquale Caponnetto, Sebastiano Battiato

Abstract:This paper provides a comprehensive analysis of cognitive biases in forensics and digital forensics, examining their implications for decision-making processes in these fields. It explores the various types of cognitive biases that may arise during forensic investigations and digital forensic analyses, such as confirmation bias, expectation bias, overconfidence in errors, contextual bias, and attributional biases. It also evaluates existing methods and techniques used to mitigate cognitive biases in these contexts, assessing the effectiveness of interventions aimed at reducing biases and improving decision-making outcomes. Additionally, this paper introduces a new cognitive bias, called "impostor bias", that may affect the use of generative Artificial Intelligence (AI) tools in forensics and digital forensics. The impostor bias is the tendency to doubt the authenticity or validity of the output generated by AI tools, such as deepfakes, in the form of audio, images, and videos. This bias may lead to erroneous judgments or false accusations, undermining the reliability and credibility of forensic evidence. The paper discusses the potential causes and consequences of the impostor bias, and suggests some strategies to prevent or counteract it. By addressing these topics, this paper seeks to offer valuable insights into understanding cognitive biases in forensic practices and provide recommendations for future research and practical applications to enhance the objectivity and validity of forensic investigations.

Via

Access Paper or Ask Questions