Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zipei Wang

MLLM-Enhanced Face Forgery Detection: A Vision-Language Fusion Solution

May 04, 2025

Siran Peng, Zipei Wang, Li Gao, Xiangyu Zhu, Tianshuo Zhang, Ajian Liu, Haoyuan Zhang, Zhen Lei

Abstract:Reliable face forgery detection algorithms are crucial for countering the growing threat of deepfake-driven disinformation. Previous research has demonstrated the potential of Multimodal Large Language Models (MLLMs) in identifying manipulated faces. However, existing methods typically depend on either the Large Language Model (LLM) alone or an external detector to generate classification results, which often leads to sub-optimal integration of visual and textual modalities. In this paper, we propose VLF-FFD, a novel Vision-Language Fusion solution for MLLM-enhanced Face Forgery Detection. Our key contributions are twofold. First, we present EFF++, a frame-level, explainability-driven extension of the widely used FaceForensics++ (FF++) dataset. In EFF++, each manipulated video frame is paired with a textual annotation that describes both the forgery artifacts and the specific manipulation technique applied, enabling more effective and informative MLLM training. Second, we design a Vision-Language Fusion Network (VLF-Net) that promotes bidirectional interaction between visual and textual features, supported by a three-stage training pipeline to fully leverage its potential. VLF-FFD achieves state-of-the-art (SOTA) performance in both cross-dataset and intra-dataset evaluations, underscoring its exceptional effectiveness in face forgery detection.

Via

Access Paper or Ask Questions

Image Deconvolution with Deep Image and Kernel Priors

Oct 18, 2019

Zhunxuan Wang, Zipei Wang, Qiqi Li, Hakan Bilen

Figure 1 for Image Deconvolution with Deep Image and Kernel Priors

Figure 2 for Image Deconvolution with Deep Image and Kernel Priors

Figure 3 for Image Deconvolution with Deep Image and Kernel Priors

Figure 4 for Image Deconvolution with Deep Image and Kernel Priors

Abstract:Image deconvolution is the process of recovering convolutional degraded images, which is always a hard inverse problem because of its mathematically ill-posed property. On the success of the recently proposed deep image prior (DIP), we build an image deconvolution model with deep image and kernel priors (DIKP). DIP is a learning-free representation which uses neural net structures to express image prior information, and it showed great success in many energy-based models, e.g. denoising, super-resolution, inpainting. Instead, our DIKP model uses such priors in image deconvolution to model not only images but also kernels, combining the ideas of traditional learning-free deconvolution methods with neural nets. In this paper, we show that DIKP improve the performance of learning-free image deconvolution, and we experimentally demonstrate this on the standard benchmark of six standard test images in terms of PSNR and visual effects.

* In Proceedings of the 2019 IEEE International Conference on Computer Vision Workshops (ICCVW)

Via

Access Paper or Ask Questions