Alert button
Picture for Siwei Lyu

Siwei Lyu

Alert button

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

Nov 19, 2023
Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, Baoyuan Wu

Deepfake detection faces a critical generalization hurdle, with performance deteriorating when there is a mismatch between the distributions of training and testing data. A broadly received explanation is the tendency of these detectors to be overfitted to forgery-specific artifacts, rather than learning features that are widely applicable across various forgeries. To address this issue, we propose a simple yet effective detector called LSDA (\underline{L}atent \underline{S}pace \underline{D}ata \underline{A}ugmentation), which is based on a heuristic idea: representations with a wider variety of forgeries should be able to learn a more generalizable decision boundary, thereby mitigating the overfitting of method-specific features (see Figure. 1). Following this idea, we propose to enlarge the forgery space by constructing and simulating variations within and across forgery features in the latent space. This approach encompasses the acquisition of enriched, domain-specific features and the facilitation of smoother transitions between different forgery types, effectively bridging domain gaps. Our approach culminates in refining a binary classifier that leverages the distilled knowledge from the enhanced features, striving for a generalizable deepfake detector. Comprehensive experiments show that our proposed method is surprisingly effective and transcends state-of-the-art detectors across several widely used benchmarks.

Viaarxiv icon

UMedNeRF: Uncertainty-aware Single View Volumetric Rendering for Medical Neural Radiance Fields

Nov 17, 2023
Jing Hu, Qinrui Fan, Shu Hu, Siwei Lyu, Xi Wu, Xin Wang

In the field of clinical medicine, computed tomography (CT) is an effective medical imaging modality for the diagnosis of various pathologies. Compared with X-ray images, CT images can provide more information, including multi-planar slices and three-dimensional structures for clinical diagnosis. However, CT imaging requires patients to be exposed to large doses of ionizing radiation for a long time, which may cause irreversible physical harm. In this paper, we propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on generated radiation fields. The network can learn a continuous representation of CT projections from 2D X-ray images by obtaining the internal structure and depth information and using adaptive loss weights to ensure the quality of the generated images. Our model is trained on publicly available knee and chest datasets, and we show the results of CT projection rendering with a single X-ray and compare our method with other methods based on generated radiation fields.

Viaarxiv icon

Uncertainty-aware Single View Volumetric Rendering for Medical Neural Radiance Fields

Nov 10, 2023
Jing Hu, Qinrui Fan, Shu Hu, Siwei Lyu, Xi Wu, Xin Wang

In the field of clinical medicine, computed tomography (CT) is an effective medical imaging modality for the diagnosis of various pathologies. Compared with X-ray images, CT images can provide more information, including multi-planar slices and three-dimensional structures for clinical diagnosis. However, CT imaging requires patients to be exposed to large doses of ionizing radiation for a long time, which may cause irreversible physical harm. In this paper, we propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on generated radiation fields. The network can learn a continuous representation of CT projections from 2D X-ray images by obtaining the internal structure and depth information and using adaptive loss weights to ensure the quality of the generated images. Our model is trained on publicly available knee and chest datasets, and we show the results of CT projection rendering with a single X-ray and compare our method with other methods based on generated radiation fields.

Viaarxiv icon

Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter

Oct 07, 2023
Shuai Li, Siwei Lyu, Jeff Trinkle

Figure 1 for Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter
Figure 2 for Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter
Figure 3 for Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter
Figure 4 for Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter

Due to the limitations of the robotic sensors, during a robotic manipulation task, the acquisition of the object's state can be unreliable and noisy. Combining an accurate model of multi-body dynamic system with Bayesian filtering methods has been shown to be able to filter out noise from the object's observed states. However, efficiency of these filtering methods suffers from samples that violate the physical constraints, e.g., no penetration constraint. In this paper, we propose a Rao-Blackwellized Particle Filter (RBPF) that samples the contact states and updates the object's poses using Kalman filters. This RBPF also enforces the physical constraints on the samples by solving a quadratic programming problem. By comparing our method with methods that does not consider physical constraints, we show that our proposed RBPF is not only able to estimate the object's states, e.g., poses, more accurately but also able to infer unobserved states, e.g., velocities, with higher precision.

Viaarxiv icon

Integrating Audio-Visual Features for Multimodal Deepfake Detection

Oct 05, 2023
Sneha Muppalla, Shan Jia, Siwei Lyu

Figure 1 for Integrating Audio-Visual Features for Multimodal Deepfake Detection
Figure 2 for Integrating Audio-Visual Features for Multimodal Deepfake Detection
Figure 3 for Integrating Audio-Visual Features for Multimodal Deepfake Detection
Figure 4 for Integrating Audio-Visual Features for Multimodal Deepfake Detection

Deepfakes are AI-generated media in which an image or video has been digitally modified. The advancements made in deepfake technology have led to privacy and security issues. Most deepfake detection techniques rely on the detection of a single modality. Existing methods for audio-visual detection do not always surpass that of the analysis based on single modalities. Therefore, this paper proposes an audio-visual-based method for deepfake detection, which integrates fine-grained deepfake identification with binary classification. We categorize the samples into four types by combining labels specific to each single modality. This method enhances the detection under intra-domain and cross-domain testing.

Viaarxiv icon

Controlling Neural Style Transfer with Deep Reinforcement Learning

Sep 30, 2023
Chengming Feng, Jing Hu, Xin Wang, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Siwei Lyu

Figure 1 for Controlling Neural Style Transfer with Deep Reinforcement Learning
Figure 2 for Controlling Neural Style Transfer with Deep Reinforcement Learning
Figure 3 for Controlling Neural Style Transfer with Deep Reinforcement Learning
Figure 4 for Controlling Neural Style Transfer with Deep Reinforcement Learning

Controlling the degree of stylization in the Neural Style Transfer (NST) is a little tricky since it usually needs hand-engineering on hyper-parameters. In this paper, we propose the first deep Reinforcement Learning (RL) based architecture that splits one-step style transfer into a step-wise process for the NST task. Our RL-based method tends to preserve more details and structures of the content image in early steps, and synthesize more style patterns in later steps. It is a user-easily-controlled style-transfer method. Additionally, as our RL-based model performs the stylization progressively, it is lightweight and has lower computational complexity than existing one-step Deep Learning (DL) based models. Experimental results demonstrate the effectiveness and robustness of our method.

* Accepted by IJCAI 2023. The contributions of Chengming Feng and Jing Hu to this paper were equal. arXiv admin note: text overlap with arXiv:2309.13672 
Viaarxiv icon

Improving Cross-dataset Deepfake Detection with Deep Information Decomposition

Sep 30, 2023
Shanmin Yang, Shu Hu, Bin Zhu, Ying Fu, Siwei Lyu, Xi Wu, Xin Wang

Figure 1 for Improving Cross-dataset Deepfake Detection with Deep Information Decomposition
Figure 2 for Improving Cross-dataset Deepfake Detection with Deep Information Decomposition
Figure 3 for Improving Cross-dataset Deepfake Detection with Deep Information Decomposition
Figure 4 for Improving Cross-dataset Deepfake Detection with Deep Information Decomposition

Deepfake technology poses a significant threat to security and social trust. Although existing detection methods have demonstrated high performance in identifying forgeries within datasets using the same techniques for training and testing, they suffer from sharp performance degradation when faced with cross-dataset scenarios where unseen deepfake techniques are tested. To address this challenge, we propose a deep information decomposition (DID) framework in this paper. Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over visual artifacts. Specifically, it decomposes facial features into deepfake-related and irrelevant information and optimizes the deepfake information for real/fake discrimination to be independent of other factors. Our approach improves the robustness of deepfake detection against various irrelevant information changes and enhances the generalization ability of the framework to detect unseen forgery methods. Extensive experimental comparisons with existing state-of-the-art detection methods validate the effectiveness and superiority of the DID framework on cross-dataset deepfake detection.

Viaarxiv icon

Deep Reinforcement Learning for Image-to-Image Translation

Sep 24, 2023
Xin Wang, Ziwei Luo, Jing Hu, Chengming Feng, Shu Hu, Bin Zhu, Xi Wu, Siwei Lyu

Figure 1 for Deep Reinforcement Learning for Image-to-Image Translation
Figure 2 for Deep Reinforcement Learning for Image-to-Image Translation
Figure 3 for Deep Reinforcement Learning for Image-to-Image Translation
Figure 4 for Deep Reinforcement Learning for Image-to-Image Translation

Most existing Image-to-Image Translation (I2IT) methods generate images in a single run of a deep learning (DL) model. However, designing such a single-step model is always challenging, requiring a huge number of parameters and easily falling into bad global minimums and overfitting. In this work, we reformulate I2IT as a step-wise decision-making problem via deep reinforcement learning (DRL) and propose a novel framework that performs RL-based I2IT (RL-I2IT). The key feature in the RL-I2IT framework is to decompose a monolithic learning process into small steps with a lightweight model to progressively transform a source image successively to a target image. Considering that it is challenging to handle high dimensional continuous state and action spaces in the conventional RL framework, we introduce meta policy with a new concept Plan to the standard Actor-Critic model, which is of a lower dimension than the original image and can facilitate the actor to generate a tractable high dimensional action. In the RL-I2IT framework, we also employ a task-specific auxiliary learning strategy to stabilize the training process and improve the performance of the corresponding task. Experiments on several I2IT tasks demonstrate the effectiveness and robustness of the proposed method when facing high-dimensional continuous action space problems.

Viaarxiv icon

Outlier Robust Adversarial Training

Sep 10, 2023
Shu Hu, Zhenhuan Yang, Xin Wang, Yiming Ying, Siwei Lyu

Figure 1 for Outlier Robust Adversarial Training
Figure 2 for Outlier Robust Adversarial Training
Figure 3 for Outlier Robust Adversarial Training
Figure 4 for Outlier Robust Adversarial Training

Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are robust with regard to the low-quality training data and the potential adversarial attack at inference time simultaneously. It is for this reason that we introduce Outlier Robust Adversarial Training (ORAT) in this work. ORAT is based on a bi-level optimization formulation of adversarial training with a robust rank-based loss function. Theoretically, we show that the learning objective of ORAT satisfies the $\mathcal{H}$-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss. Furthermore, we analyze its generalization ability and provide uniform convergence rates in high probability. ORAT can be optimized with a simple algorithm. Experimental evaluations on three benchmark datasets demonstrate the effectiveness and robustness of ORAT in handling outliers and adversarial attacks. Our code is available at https://github.com/discovershu/ORAT.

* Accepted by The 15th Asian Conference on Machine Learning (ACML 2023) 
Viaarxiv icon

Language-guided Human Motion Synthesis with Atomic Actions

Aug 18, 2023
Yuanhao Zhai, Mingzhen Huang, Tianyu Luan, Lu Dong, Ifeoma Nwogu, Siwei Lyu, David Doermann, Junsong Yuan

Figure 1 for Language-guided Human Motion Synthesis with Atomic Actions
Figure 2 for Language-guided Human Motion Synthesis with Atomic Actions
Figure 3 for Language-guided Human Motion Synthesis with Atomic Actions
Figure 4 for Language-guided Human Motion Synthesis with Atomic Actions

Language-guided human motion synthesis has been a challenging task due to the inherent complexity and diversity of human behaviors. Previous methods face limitations in generalization to novel actions, often resulting in unrealistic or incoherent motion sequences. In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and employing a curriculum learning strategy to learn atomic action composition. First, we disentangle complex human motions into a set of atomic actions during learning, and then assemble novel actions using the learned atomic actions, which offers better adaptability to new actions. Moreover, we introduce a curriculum learning training strategy that leverages masked motion modeling with a gradual increase in the mask ratio, and thus facilitates atomic action assembly. This approach mitigates the overfitting problem commonly encountered in previous methods while enforcing the model to learn better motion representations. We demonstrate the effectiveness of ATOM through extensive experiments, including text-to-motion and action-to-motion synthesis tasks. We further illustrate its superiority in synthesizing plausible and coherent text-guided human motion sequences.

* Accepted to ACM MM 2023, code: https://github.com/yhZhai/ATOM 
Viaarxiv icon