Deepfakes are synthetic media in which a person's likeness is replaced with someone else's likeness using deep-learning techniques.
The rapid proliferation of realistic deepfakes has raised urgent concerns over their misuse, motivating the use of defensive watermarks in synthetic images for reliable detection and provenance tracking. However, this defense paradigm assumes such watermarks are inherently resistant to removal. We challenge this assumption with DeMark, a query-free black-box attack framework that targets defensive image watermarking schemes for deepfakes. DeMark exploits latent-space vulnerabilities in encoder-decoder watermarking models through a compressive sensing based sparsification process, suppressing watermark signals while preserving perceptual and structural realism appropriate for deepfakes. Across eight state-of-the-art watermarking schemes, DeMark reduces watermark detection accuracy from 100% to 32.9% on average while maintaining natural visual quality, outperforming existing attacks. We further evaluate three defense strategies, including image super resolution, sparse watermarking, and adversarial training, and find them largely ineffective. These results demonstrate that current encoder decoder watermarking schemes remain vulnerable to latent-space manipulations, underscoring the need for more robust watermarking methods to safeguard against deepfakes.
Most prior deepfake detection methods lack explainable outputs. With the growing interest in multimodal large language models (MLLMs), researchers have started exploring their use in interpretable deepfake detection. However, a major obstacle in applying MLLMs to this task is the scarcity of high-quality datasets with detailed forgery attribution annotations, as textual annotation is both costly and challenging - particularly for high-fidelity forged images or videos. Moreover, multiple studies have shown that reinforcement learning (RL) can substantially enhance performance in visual tasks, especially in improving cross-domain generalization. To facilitate the adoption of mainstream MLLM frameworks in deepfake detection with reduced annotation cost, and to investigate the potential of RL in this context, we propose an automated Chain-of-Thought (CoT) data generation framework based on Self-Blended Images, along with an RL-enhanced deepfake detection framework. Extensive experiments validate the effectiveness of our CoT data construction pipeline, tailored reward mechanism, and feedback-driven synthetic data generation approach. Our method achieves performance competitive with state-of-the-art (SOTA) approaches across multiple cross-dataset benchmarks. Implementation details are available at https://github.com/deon1219/rlsbi.
Generative models now produce imperceptible, fine-grained manipulated faces, posing significant privacy risks. However, existing AI-generated face datasets generally lack focus on samples with fine-grained regional manipulations. Furthermore, no researchers have yet studied the real impact of splice attacks, which occur between real and manipulated samples, on detectors. We refer to these as detector-evasive samples. Based on this, we introduce the DiffFace-Edit dataset, which has the following advantages: 1) It contains over two million AI-generated fake images. 2) It features edits across eight facial regions (e.g., eyes, nose) and includes a richer variety of editing combinations, such as single-region and multi-region edits. Additionally, we specifically analyze the impact of detector-evasive samples on detection models. We conduct a comprehensive analysis of the dataset and propose a cross-domain evaluation that combines IMDL methods. Dataset will be available at https://github.com/ywh1093/DiffFace-Edit.
Humans use context to assess the veracity of information. However, current audio deepfake detectors only analyze the audio file without considering either context or transcripts. We create and analyze a Journalist-provided Deepfake Dataset (JDD) of 255 public deepfakes which were primarily contributed by over 70 journalists since early 2024. We also generate a synthetic audio dataset (SYN) of dead public figures and propose a novel Context-based Audio Deepfake Detector (CADD) architecture. In addition, we evaluate performance on two large-scale datasets: ITW and P$^2$V. We show that sufficient context and/or the transcript can significantly improve the efficacy of audio deepfake detectors. Performance (measured via F1 score, AUC, and EER) of multiple baseline audio deepfake detectors and traditional classifiers can be improved by 5%-37.58% in F1-score, 3.77%-42.79% in AUC, and 6.17%-47.83% in EER. We additionally show that CADD, via its use of context and/or transcripts, is more robust to 5 adversarial evasion strategies, limiting performance degradation to an average of just -0.71% across all experiments. Code, models, and datasets are available at our project page: https://sites.northwestern.edu/nsail/cadd-context-based-audio-deepfake-detection (access restricted during review).
Generative AI systems increasingly enable the production of highly realistic synthetic media. Civitai, a popular community-driven platform for AI-generated content, operates a monetized feature called Bounties, which allows users to commission the generation of content in exchange for payment. To examine how this mechanism is used and what content it incentivizes, we conduct a longitudinal analysis of all publicly available bounty requests collected over a 14-month period following the platform's launch. We find that the bounty marketplace is dominated by tools that let users steer AI models toward content they were not trained to generate. At the same time, requests for content that is "Not Safe For Work" are widespread and have increased steadily over time, now comprising a majority of all bounties. Participation in bounty creation is uneven, with 20% of requesters accounting for roughly half of requests. Requests for "deepfake" - media depicting identifiable real individuals - exhibit a higher concentration than other types of bounties. A nontrivial subset of these requests involves explicit deepfakes despite platform policies prohibiting such content. These bounties disproportionately target female celebrities, revealing a pronounced gender asymmetry in social harm. Together, these findings show how monetized, community-driven generative AI platforms can produce gendered harms, raising questions about consent, governance, and enforcement.
Deepfakes are synthetic media generated by artificial intelligence, with positive applications in education and creativity, but also serious negative impacts such as fraud, misinformation, and privacy violations. Although detection techniques have advanced, comprehensive evaluation methods that go beyond classification performance remain lacking. This paper proposes a reliability assessment framework based on four pillars: transferability, robustness, interpretability, and computational efficiency. An analysis of five state-of-the-art methods revealed significant progress as well as critical limitations.
Audio deepfake detection has become increasingly challenging due to rapid advances in speech synthesis and voice conversion technologies, particularly under channel distortions, replay attacks, and real-world recording conditions. This paper proposes a resolution-aware audio deepfake detection framework that explicitly models and aligns multi-resolution spectral representations through cross-scale attention and consistency learning. Unlike conventional single-resolution or implicit feature-fusion approaches, the proposed method enforces agreement across complementary time--frequency scales. The proposed framework is evaluated on three representative benchmarks: ASVspoof 2019 (LA and PA), the Fake-or-Real (FoR) dataset, and the In-the-Wild Audio Deepfake dataset under a speaker-disjoint protocol. The method achieves near-perfect performance on ASVspoof LA (EER 0.16%), strong robustness on ASVspoof PA (EER 5.09%), FoR rerecorded audio (EER 4.54%), and in-the-wild deepfakes (AUC 0.98, EER 4.81%), significantly outperforming single-resolution and non-attention baselines under challenging conditions. The proposed model remains lightweight and efficient, requiring only 159k parameters and less than 1~GFLOP per inference, making it suitable for practical deployment. Comprehensive ablation studies confirm the critical contributions of cross-scale attention and consistency learning, while gradient-based interpretability analysis reveals that the model learns resolution-consistent and semantically meaningful spectral cues across diverse spoofing conditions. These results demonstrate that explicit cross-resolution modeling provides a principled, robust, and scalable foundation for next-generation audio deepfake detection systems.
Recent deepfake detection methods have increasingly explored frequency domain representations to reveal manipulation artifacts that are difficult to detect in the spatial domain. However, most existing approaches rely primarily on spectral magnitude, implicitly under exploring the role of phase information. In this work, we propose Phase4DFD, a phase aware frequency domain deepfake detection framework that explicitly models phase magnitude interactions via a learnable attention mechanism. Our approach augments standard RGB input with Fast Fourier Transform (FFT) magnitude and local binary pattern (LBP) representations to expose subtle synthesis artifacts that remain indistinguishable under spatial analysis alone. Crucially, we introduce an input level phase aware attention module that uses phase discontinuities commonly introduced by synthetic generation to guide the model toward frequency patterns that are most indicative of manipulation before backbone feature extraction. The attended multi domain representation is processed by an efficient BNext M backbone, with optional channel spatial attention applied for semantic feature refinement. Extensive experiments on the CIFAKE and DFFD datasets demonstrate that our proposed model Phase4DFD outperforms state of the art spatial and frequency-based detectors while maintaining low computational overhead. Comprehensive ablation studies further confirm that explicit phase modeling provides complementary and non-redundant information beyond magnitude-only frequency representations.
The rapid proliferation of synthetic media, presentation attacks, and document forgeries has created significant vulnerabilities in Know Your Customer (KYC) workflows across financial services, telecommunications, and digital-identity ecosystems. Traditional monolithic KYC systems lack the scalability and agility required to counter adaptive fraud. This paper proposes an Agentic AI Microservice Framework that integrates modular vision models, liveness assessment, deepfake detection, OCR-based document forensics, multimodal identity linking, and a policy driven risk engine. The system leverages autonomous micro-agents for task decomposition, pipeline orchestration, dynamic retries, and human-in-the-loop escalation. Experimental evaluations demonstrate improved detection accuracy, reduced latency, and enhanced resilience against adversarial inputs. The framework offers a scalable blueprint for regulated industries seeking robust, real-time, and privacy-preserving KYC verification.
Deepfake detection systems deployed in real-world environments are subject to adversaries capable of crafting imperceptible perturbations that degrade model performance. While adversarial training is a widely adopted defense, its effectiveness under realistic conditions -- where attackers operate with limited knowledge and mismatched data distributions - remains underexplored. In this work, we extend the DUMB -- Dataset soUrces, Model architecture and Balance - and DUMBer methodology to deepfake detection. We evaluate detectors robustness against adversarial attacks under transferability constraints and cross-dataset configuration to extract real-world insights. Our study spans five state-of-the-art detectors (RECCE, SRM, XCeption, UCF, SPSL), three attacks (PGD, FGSM, FPBA), and two datasets (FaceForensics++ and Celeb-DF-V2). We analyze both attacker and defender perspectives mapping results to mismatch scenarios. Experiments show that adversarial training strategies reinforce robustness in the in-distribution cases but can also degrade it under cross-dataset configuration depending on the strategy adopted. These findings highlight the need for case-aware defense strategies in real-world applications exposed to adversarial attacks.