Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Siwei Lyu

AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics

Apr 14, 2023
Shan Jia, Mingzhen Huang, Zhou Zhou, Yan Ju, Jialing Cai, Siwei Lyu

Figure 1 for AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics

Figure 2 for AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics

Figure 3 for AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics

Figure 4 for AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics

Recent advancements in language-image models have led to the development of highly realistic images that can be generated from textual descriptions. However, the increased visual quality of these generated images poses a potential threat to the field of media forensics. This paper aims to investigate the level of challenge that language-image generation models pose to media forensics. To achieve this, we propose a new approach that leverages the DALL-E2 language-image model to automatically generate and splice masked regions guided by a text prompt. To ensure the creation of realistic manipulations, we have designed an annotation platform with human checking to verify reasonable text prompts. This approach has resulted in the creation of a new image dataset called AutoSplice, containing 5,894 manipulated and authentic images. Specifically, we have generated a total of 3,621 images by locally or globally manipulating real-world image-caption pairs, which we believe will provide a valuable resource for developing generalized detection methods in this area. The dataset is evaluated under two media forensic tasks: forgery detection and localization. Our extensive experiments show that most media forensic models struggle to detect the AutoSplice dataset as an unseen manipulation. However, when fine-tuned models are used, they exhibit improved performance in both tasks.

Via

Access Paper or Ask Questions

Adversarial Machine Learning: A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Example

Feb 19, 2023
Baoyuan Wu, Li Liu, Zihao Zhu, Qingshan Liu, Zhaofeng He, Siwei Lyu

Figure 1 for Adversarial Machine Learning: A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Example

Figure 2 for Adversarial Machine Learning: A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Example

Figure 3 for Adversarial Machine Learning: A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Example

Figure 4 for Adversarial Machine Learning: A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Example

Adversarial machine learning (AML) studies the adversarial phenomenon of machine learning, which may make inconsistent or unexpected predictions with humans. Some paradigms have been recently developed to explore this adversarial phenomenon occurring at different stages of a machine learning system, such as training-time adversarial attack (i.e., backdoor attack), deployment-time adversarial attack (i.e., weight attack), and inference-time adversarial attack (i.e., adversarial example). However, although these paradigms share a common goal, their developments are almost independent, and there is still no big picture of AML. In this work, we aim to provide a unified perspective to the AML community to systematically review the overall progress of this field. We firstly provide a general definition about AML, and then propose a unified mathematical framework to covering existing attack paradigms. According to the proposed unified framework, we can not only clearly figure out the connections and differences among these paradigms, but also systematically categorize and review existing works in each paradigm.

* 31 pages, 4 figures, 8 tables, 249 reference papers

Via

Access Paper or Ask Questions

Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts

Feb 18, 2023
Chengzhe Sun, Shan Jia, Shuwei Hou, Ehab AlBadawy, Siwei Lyu

Figure 1 for Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts

Figure 2 for Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts

Figure 3 for Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts

Figure 4 for Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts

The advancements of AI-synthesized human voices have introduced a growing threat of impersonation and disinformation. It is therefore of practical importance to developdetection methods for synthetic human voices. This work proposes a new approach to detect synthetic human voices based on identifying artifacts of neural vocoders in audio signals. A neural vocoder is a specially designed neural network that synthesizes waveforms from temporal-frequency representations, e.g., mel-spectrograms. The neural vocoder is a core component in most DeepFake audio synthesis models. Hence the identification of neural vocoder processing implies that an audio sample may have been synthesized. To take advantage of the vocoder artifacts for synthetic human voice detection, we introduce a multi-task learning framework for a binary-class RawNet2 model that shares the front-end feature extractor with a vocoder identification module. We treat the vocoder identification as a pretext task to constrain the front-end feature extractor to focus on vocoder artifacts and provide discriminative features for the final binary classifier. Our experiments show that the improved RawNet2 model based on vocoder identification achieves an overall high classification performance on the binary task.

Via

Access Paper or Ask Questions

Attacking Important Pixels for Anchor-free Detectors

Jan 26, 2023
Yunxu Xie, Shu Hu, Xin Wang, Quanyu Liao, Bin Zhu, Xi Wu, Siwei Lyu

Figure 1 for Attacking Important Pixels for Anchor-free Detectors

Figure 2 for Attacking Important Pixels for Anchor-free Detectors

Figure 3 for Attacking Important Pixels for Anchor-free Detectors

Figure 4 for Attacking Important Pixels for Anchor-free Detectors

Deep neural networks have been demonstrated to be vulnerable to adversarial attacks: subtle perturbation can completely change the prediction result. Existing adversarial attacks on object detection focus on attacking anchor-based detectors, which may not work well for anchor-free detectors. In this paper, we propose the first adversarial attack dedicated to anchor-free detectors. It is a category-wise attack that attacks important pixels of all instances of a category simultaneously. Our attack manifests in two forms, sparse category-wise attack (SCA) and dense category-wise attack (DCA), that minimize the $L_0$ and $L_\infty$ norm-based perturbations, respectively. For DCA, we present three variants, DCA-G, DCA-L, and DCA-S, that select a global region, a local region, and a semantic region, respectively, to attack. Our experiments on large-scale benchmark datasets including PascalVOC, MS-COCO, and MS-COCO Keypoints indicate that our proposed methods achieve state-of-the-art attack performance and transferability on both object detection and human pose estimation tasks.

* Yunxu Xie and Shu Hu contributed equally

Via

Access Paper or Ask Questions

GLFF: Global and Local Feature Fusion for Face Forgery Detection

Nov 26, 2022
Yan Ju, Shan Jia, Jialing Cai, Haiying Guan, Siwei Lyu

Figure 1 for GLFF: Global and Local Feature Fusion for Face Forgery Detection

Figure 2 for GLFF: Global and Local Feature Fusion for Face Forgery Detection

Figure 3 for GLFF: Global and Local Feature Fusion for Face Forgery Detection

Figure 4 for GLFF: Global and Local Feature Fusion for Face Forgery Detection

With the rapid development of deep generative models (such as Generative Adversarial Networks and Auto-encoders), AI-synthesized images of the human face are now of such high quality that humans can hardly distinguish them from pristine ones. Although existing detection methods have shown high performance in specific evaluation settings, e.g., on images from seen models or on images without real-world post-processings, they tend to suffer serious performance degradation in real-world scenarios where testing images can be generated by more powerful generation models or combined with various post-processing operations. To address this issue, we propose a Global and Local Feature Fusion (GLFF) to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for face forgery detection. GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction. Due to the lack of a face forgery dataset simulating real-world applications for evaluation, we further create a challenging face forgery dataset, named DeepFakeFaceForensics (DF^3), which contains 6 state-of-the-art generation models and a variety of post-processing techniques to approach the real-world scenarios. Experimental results demonstrate the superiority of our method to the state-of-the-art methods on the proposed DF^3 dataset and three other open-source datasets.

Via

Access Paper or Ask Questions

Fusion-based Few-Shot Morphing Attack Detection and Fingerprinting

Oct 27, 2022
Na Zhang, Shan Jia, Siwei Lyu, Xin Li

Figure 1 for Fusion-based Few-Shot Morphing Attack Detection and Fingerprinting

Figure 2 for Fusion-based Few-Shot Morphing Attack Detection and Fingerprinting

Figure 3 for Fusion-based Few-Shot Morphing Attack Detection and Fingerprinting

Figure 4 for Fusion-based Few-Shot Morphing Attack Detection and Fingerprinting

The vulnerability of face recognition systems to morphing attacks has posed a serious security threat due to the wide adoption of face biometrics in the real world. Most existing morphing attack detection (MAD) methods require a large amount of training data and have only been tested on a few predefined attack models. The lack of good generalization properties, especially in view of the growing interest in developing novel morphing attacks, is a critical limitation with existing MAD research. To address this issue, we propose to extend MAD from supervised learning to few-shot learning and from binary detection to multiclass fingerprinting in this paper. Our technical contributions include: 1) We propose a fusion-based few-shot learning (FSL) method to learn discriminative features that can generalize to unseen morphing attack types from predefined presentation attacks; 2) The proposed FSL based on the fusion of the PRNU model and Noiseprint network is extended from binary MAD to multiclass morphing attack fingerprinting (MAF). 3) We have collected a large-scale database, which contains five face datasets and eight different morphing algorithms, to benchmark the proposed few-shot MAF (FS-MAF) method. Extensive experimental results show the outstanding performance of our fusion-based FS-MAF. The code and data will be publicly available at https://github.com/nz0001na/mad maf.

Via

Access Paper or Ask Questions

RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control

Oct 20, 2022
Yanfei Xiang, Xin Wang, Shu Hu, Bin Zhu, Xiaomeng Huang, Xi Wu, Siwei Lyu

Figure 1 for RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control

Figure 2 for RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control

Figure 3 for RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control

Figure 4 for RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control

Reinforcement learning is applied to solve actual complex tasks from high-dimensional, sensory inputs. The last decade has developed a long list of reinforcement learning algorithms. Recent progress benefits from deep learning for raw sensory signal representation. One question naturally arises: how well do they perform concerning different robotic manipulation tasks? Benchmarks use objective performance metrics to offer a scientific way to compare algorithms. In this paper, we present RMBench, the first benchmark for robotic manipulations, which have high-dimensional continuous action and state spaces. We implement and evaluate reinforcement learning algorithms that directly use observed pixels as inputs. We report their average performance and learning curves to show their performance and stability of training. Our study concludes that none of the studied algorithms can handle all tasks well, soft Actor-Critic outperforms most algorithms in average reward and stability, and an algorithm combined with data augmentation may facilitate learning policies. Our code is publicly available at https://anonymous.4open.science/r/RMBench-2022-3424, including all benchmark tasks and studied algorithms.

* 8 pages, 2 figures, 2 tables

Via

Access Paper or Ask Questions

Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification

Sep 19, 2022
Syeda Nyma Ferdous, Xin Li, Siwei Lyu

Figure 1 for Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification

Figure 2 for Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification

Figure 3 for Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification

Figure 4 for Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification

Object Re-IDentification (ReID), one of the most significant problems in biometrics and surveillance systems, has been extensively studied by image processing and computer vision communities in the past decades. Learning a robust and discriminative feature representation is a crucial challenge for object ReID. The problem is even more challenging in ReID based on Unmanned Aerial Vehicle (UAV) as the images are characterized by continuously varying camera parameters (e.g., view angle, altitude, etc.) of a flying drone. To address this challenge, multiscale feature representation has been considered to characterize images captured from UAV flying at different altitudes. In this work, we propose a multitask learning approach, which employs a new multiscale architecture without convolution, Pyramid Vision Transformer (PVT), as the backbone for UAV-based object ReID. By uncertainty modeling of intraclass variations, our proposed model can be jointly optimized using both uncertainty-aware object ID and camera ID information. Experimental results are reported on PRAI and VRAI, two ReID data sets from aerial surveillance, to verify the effectiveness of our proposed approach

Via

Access Paper or Ask Questions

Rank-based Decomposable Losses in Machine Learning: A Survey

Jul 18, 2022
Shu Hu, Xin Wang, Siwei Lyu

Figure 1 for Rank-based Decomposable Losses in Machine Learning: A Survey

Figure 2 for Rank-based Decomposable Losses in Machine Learning: A Survey

Figure 3 for Rank-based Decomposable Losses in Machine Learning: A Survey

Figure 4 for Rank-based Decomposable Losses in Machine Learning: A Survey

Recent works have revealed an essential paradigm in designing loss functions that differentiate individual losses vs. aggregate losses. The individual loss measures the quality of the model on a sample, while the aggregate loss combines individual losses/scores over each training sample. Both have a common procedure that aggregates a set of individual values to a single numerical value. The ranking order reflects the most fundamental relation among individual values in designing losses. In addition, decomposability, in which a loss can be decomposed into an ensemble of individual terms, becomes a significant property of organizing losses/scores. This survey provides a systematic and comprehensive review of rank-based decomposable losses in machine learning. Specifically, we provide a new taxonomy of loss functions that follows the perspectives of aggregate loss and individual loss. We identify the aggregator to form such losses, which are examples of set functions. We organize the rank-based decomposable losses into eight categories. Following these categories, we review the literature on rank-based aggregate losses and rank-based individual losses. We describe general formulas for these losses and connect them with existing research topics. We also suggest future research directions spanning unexplored, remaining, and emerging issues in rank-based decomposable losses.

* 20 pages

Via

Access Paper or Ask Questions