Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Taiping Yao

DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

Oct 06, 2024

Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji

Figure 1 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

Figure 2 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

Figure 3 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

Figure 4 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

Abstract:The rapid progress of Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content. Existing methods often struggle to generalize to unseen domains due to the diverse nature of facial manipulations. In this paper, we revisit the generation process and identify a universal principle: Deepfake images inherently contain information from both source and target identities, while genuine faces maintain a consistent identity. Building upon this insight, we introduce DiffusionFake, a novel plug-and-play framework that reverses the generative process of face forgeries to enhance the generalization of detection models. DiffusionFake achieves this by injecting the features extracted by the detection model into a frozen pre-trained Stable Diffusion model, compelling it to reconstruct the corresponding target and source images. This guided reconstruction process constrains the detection network to capture the source and target related features to facilitate the reconstruction, thereby learning rich and disentangled representations that are more resilient to unseen forgeries. Extensive experiments demonstrate that DiffusionFake significantly improves cross-domain generalization of various detector architectures without introducing additional parameters during inference. Our Codes are available in https://github.com/skJack/DiffusionFake.git.

* Accepted by NeurIPS 2024

Via

Access Paper or Ask Questions

Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

Sep 04, 2024

Kaiqing Lin, Yuzhen Lin, Weixiang Li, Taiping Yao, Bin Li

Figure 1 for Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

Figure 2 for Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

Figure 3 for Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

Figure 4 for Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

Abstract:The proliferation of deepfake faces poses huge potential negative impacts on our daily lives. Despite substantial advancements in deepfake detection over these years, the generalizability of existing methods against forgeries from unseen datasets or created by emerging generative models remains constrained. In this paper, inspired by the zero-shot advantages of Vision-Language Models (VLMs), we propose a novel approach that repurposes a well-trained VLM for general deepfake detection. Motivated by the model reprogramming paradigm that manipulates the model prediction via data perturbations, our method can reprogram a pretrained VLM model (e.g., CLIP) solely based on manipulating its input without tuning the inner parameters. Furthermore, we insert a pseudo-word guided by facial identity into the text prompt. Extensive experiments on several popular benchmarks demonstrate that (1) the cross-dataset and cross-manipulation performances of deepfake detection can be significantly and consistently improved (e.g., over 88% AUC in cross-dataset setting from FF++ to WildDeepfake) using a pre-trained CLIP model with our proposed reprogramming method; (2) our superior performances are at less cost of trainable parameters, making it a promising approach for real-world applications.

Via

Access Paper or Ask Questions

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Aug 30, 2024

Zhiyuan Yan, Yandan Zhao, Shen Chen, Xinghe Fu, Taiping Yao, Shouhong Ding, Li Yuan

Figure 1 for Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Figure 2 for Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Figure 3 for Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Figure 4 for Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Abstract:Three key challenges hinder the development of current deepfake video detection: (1) Temporal features can be complex and diverse: how can we identify general temporal artifacts to enhance model generalization? (2) Spatiotemporal models often lean heavily on one type of artifact and ignore the other: how can we ensure balanced learning from both? (3) Videos are naturally resource-intensive: how can we tackle efficiency without compromising accuracy? This paper attempts to tackle the three challenges jointly. First, inspired by the notable generality of using image-level blending data for image forgery detection, we investigate whether and how video-level blending can be effective in video. We then perform a thorough analysis and identify a previously underexplored temporal forgery artifact: Facial Feature Drift (FFD), which commonly exists across different forgeries. To reproduce FFD, we then propose a novel Video-level Blending data (VB), where VB is implemented by blending the original image and its warped version frame-by-frame, serving as a hard negative sample to mine more general artifacts. Second, we carefully design a lightweight Spatiotemporal Adapter (StA) to equip a pretrained image model (both ViTs and CNNs) with the ability to capture both spatial and temporal features jointly and efficiently. StA is designed with two-stream 3D-Conv with varying kernel sizes, allowing it to process spatial and temporal features separately. Extensive experiments validate the effectiveness of the proposed methods; and show our approach can generalize well to previously unseen forgery videos, even the just-released (in 2024) SoTAs. We release our code and pretrained weights at \url{https://github.com/YZY-stack/StA4Deepfake}.

Via

Access Paper or Ask Questions

DF40: Toward Next-Generation Deepfake Detection

Jun 19, 2024

Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Li Yuan, Chengjie Wang, Shouhong Ding(+1 more)

Figure 1 for DF40: Toward Next-Generation Deepfake Detection

Figure 2 for DF40: Toward Next-Generation Deepfake Detection

Figure 3 for DF40: Toward Next-Generation Deepfake Detection

Figure 4 for DF40: Toward Next-Generation Deepfake Detection

Abstract:We propose a new comprehensive benchmark to revolutionize the current deepfake detection field to the next generation. Predominantly, existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset (e.g., FF++) and testing them on other prevalent deepfake datasets. This protocol is often regarded as a "golden compass" for navigating SoTA detectors. But can these stand-out "winners" be truly applied to tackle the myriad of realistic and diverse deepfakes lurking in the real world? If not, what underlying factors contribute to this gap? In this work, we found the dataset (both train and test) can be the "primary culprit" due to: (1) forgery diversity: Deepfake techniques are commonly referred to as both face forgery (face-swapping and face-reenactment) and entire image synthesis (AIGC). Most existing datasets only contain partial types, with limited forgery methods implemented; (2) forgery realism: The dominant training dataset, FF++, contains old forgery techniques from the past five years. "Honing skills" on these forgeries makes it difficult to guarantee effective detection of nowadays' SoTA deepfakes; (3) evaluation protocol: Most detection works perform evaluations on one type, e.g., train and test on face-swapping only, which hinders the development of universal deepfake detectors. To address this dilemma, we construct a highly diverse and large-scale deepfake dataset called DF40, which comprises 40 distinct deepfake techniques. We then conduct comprehensive evaluations using 4 standard evaluation protocols and 7 representative detectors, resulting in over 2,000 evaluations. Through these evaluations, we analyze from various perspectives, leading to 12 new insightful findings contributing to the field. We also open up 5 valuable yet previously underexplored research questions to inspire future works.

Via

Access Paper or Ask Questions

Rank-based No-reference Quality Assessment for Face Swapping

Jun 04, 2024

Xinghui Zhou, Wenbo Zhou, Tianyi Wei, Shen Chen, Taiping Yao, Shouhong Ding, Weiming Zhang, Nenghai Yu

Figure 1 for Rank-based No-reference Quality Assessment for Face Swapping

Figure 2 for Rank-based No-reference Quality Assessment for Face Swapping

Figure 3 for Rank-based No-reference Quality Assessment for Face Swapping

Figure 4 for Rank-based No-reference Quality Assessment for Face Swapping

Abstract:Face swapping has become a prominent research area in computer vision and image processing due to rapid technological advancements. The metric of measuring the quality in most face swapping methods relies on several distances between the manipulated images and the source image, or the target image, i.e., there are suitable known reference face images. Therefore, there is still a gap in accurately assessing the quality of face interchange in reference-free scenarios. In this study, we present a novel no-reference image quality assessment (NR-IQA) method specifically designed for face swapping, addressing this issue by constructing a comprehensive large-scale dataset, implementing a method for ranking image quality based on multiple facial attributes, and incorporating a Siamese network based on interpretable qualitative comparisons. Our model demonstrates the state-of-the-art performance in the quality assessment of swapped faces, providing coarse- and fine-grained. Enhanced by this metric, an improved face-swapping model achieved a more advanced level with respect to expressions and poses. Extensive experiments confirm the superiority of our method over existing general no-reference image quality assessment metrics and the latest metric of facial image quality assessment, making it well suited for evaluating face swapping images in real-world scenarios.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Test-Time Domain Generalization for Face Anti-Spoofing

Mar 28, 2024

Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Shouhong Ding, Lizhuang Ma

Figure 1 for Test-Time Domain Generalization for Face Anti-Spoofing

Figure 2 for Test-Time Domain Generalization for Face Anti-Spoofing

Figure 3 for Test-Time Domain Generalization for Face Anti-Spoofing

Figure 4 for Test-Time Domain Generalization for Face Anti-Spoofing

Abstract:Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks. While domain generalization (DG) methods have been developed to enhance FAS performance, they predominantly focus on learning domain-invariant features during training, which may not guarantee generalizability to unseen data that differs largely from the source distributions. Our insight is that testing data can serve as a valuable resource to enhance the generalizability beyond mere evaluation for DG FAS. In this paper, we introduce a novel Test-Time Domain Generalization (TTDG) framework for FAS, which leverages the testing data to boost the model's generalizability. Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space. In particular, we first introduce the innovative TTSP to project the styles of the arbitrarily unseen samples of the testing distribution to the known source space of the training distributions. We then design the efficient DSSS to synthesize diverse style shifts via learnable style bases with two specifically designed losses in a hyperspherical feature space. Our method eliminates the need for model updates at the test time and can be seamlessly integrated into not only the CNN but also ViT backbones. Comprehensive experiments on widely used cross-domain FAS benchmarks demonstrate our method's state-of-the-art performance and effectiveness.

* Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Via

Access Paper or Ask Questions

Contrastive Pseudo Learning for Open-World DeepFake Attribution

Sep 20, 2023

Zhimin Sun, Shen Chen, Taiping Yao, Bangjie Yin, Ran Yi, Shouhong Ding, Lizhuang Ma

Abstract:The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swapping or expression transferring are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled faces still remain under-explored. To push the related frontier research, we introduce a new benchmark called Open-World DeepFake Attribution (OW-DFA), which aims to evaluate attribution performance against various types of fake faces under open-world scenarios. Meanwhile, we propose a novel framework named Contrastive Pseudo Learning (CPL) for the OW-DFA task through 1) introducing a Global-Local Voting module to guide the feature alignment of forged faces with different manipulated regions, 2) designing a Confidence-based Soft Pseudo-label strategy to mitigate the pseudo-noise caused by similar methods in unlabeled set. In addition, we extend the CPL framework with a multi-stage paradigm that leverages pre-train technique and iterative learning to further enhance traceability performance. Extensive experiments verify the superiority of our proposed method on the OW-DFA and also demonstrate the interpretability of deepfake attribution task and its impact on improving the security of deepfake detection area.

* 16 pages, 7 figures, ICCV 2023

Via

Access Paper or Ask Questions

Continual Face Forgery Detection via Historical Distribution Preserving

Aug 11, 2023

Ke Sun, Shen Chen, Taiping Yao, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji

Figure 1 for Continual Face Forgery Detection via Historical Distribution Preserving

Figure 2 for Continual Face Forgery Detection via Historical Distribution Preserving

Figure 3 for Continual Face Forgery Detection via Historical Distribution Preserving

Figure 4 for Continual Face Forgery Detection via Historical Distribution Preserving

Abstract:Face forgery techniques have advanced rapidly and pose serious security threats. Existing face forgery detection methods try to learn generalizable features, but they still fall short of practical application. Additionally, finetuning these methods on historical training data is resource-intensive in terms of time and storage. In this paper, we focus on a novel and challenging problem: Continual Face Forgery Detection (CFFD), which aims to efficiently learn from new forgery attacks without forgetting previous ones. Specifically, we propose a Historical Distribution Preserving (HDP) framework that reserves and preserves the distributions of historical faces. To achieve this, we use universal adversarial perturbation (UAP) to simulate historical forgery distribution, and knowledge distillation to maintain the distribution variation of real faces across different models. We also construct a new benchmark for CFFD with three evaluation protocols. Our extensive experiments on the benchmarks show that our method outperforms the state-of-the-art competitors.

Via

Access Paper or Ask Questions

Towards General Visual-Linguistic Face Forgery Detection

Jul 31, 2023

Ke Sun, Shen Chen, Taiping Yao, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji

Figure 1 for Towards General Visual-Linguistic Face Forgery Detection

Figure 2 for Towards General Visual-Linguistic Face Forgery Detection

Figure 3 for Towards General Visual-Linguistic Face Forgery Detection

Figure 4 for Towards General Visual-Linguistic Face Forgery Detection

Abstract:Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We argue that such supervisions lack semantic information and interpretability. To address this issues, in this paper, we propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation. Since text annotations are not available in current deepfakes datasets, VLFFD first generates the mixed forgery image with corresponding fine-grained prompts via Prompt Forgery Image Generator (PFIG). Then, the fine-grained mixed data and coarse-grained original data and is jointly trained with the Coarse-and-Fine Co-training framework (C2F), enabling the model to gain more generalization and interpretability. The experiments show the proposed method improves the existing detection models on several challenging benchmarks.

Via

Access Paper or Ask Questions

Instance-Aware Domain Generalization for Face Anti-Spoofing

Apr 12, 2023

Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Ran Yi, Shouhong Ding, Lizhuang Ma

Abstract:Face anti-spoofing (FAS) based on domain generalization (DG) has been recently studied to improve the generalization on unseen scenarios. Previous methods typically rely on domain labels to align the distribution of each domain for learning domain-invariant representations. However, artificial domain labels are coarse-grained and subjective, which cannot reflect real domain distributions accurately. Besides, such domain-aware methods focus on domain-level alignment, which is not fine-grained enough to ensure that learned representations are insensitive to domain styles. To address these issues, we propose a novel perspective for DG FAS that aligns features on the instance level without the need for domain labels. Specifically, Instance-Aware Domain Generalization framework is proposed to learn the generalizable feature by weakening the features' sensitivity to instance-specific styles. Concretely, we propose Asymmetric Instance Adaptive Whitening to adaptively eliminate the style-sensitive feature correlation, boosting the generalization. Moreover, Dynamic Kernel Generator and Categorical Style Assembly are proposed to first extract the instance-specific features and then generate the style-diversified features with large style shifts, respectively, further facilitating the learning of style-insensitive features. Extensive experiments and analysis demonstrate the superiority of our method over state-of-the-art competitors. Code will be publicly available at https://github.com/qianyuzqy/IADG.

* Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Via

Access Paper or Ask Questions