Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuan Xu

Fanny

Reduced Spatial Dependency for More General Video-level Deepfake Detection

Mar 05, 2025

Beilin Chu, Xuan Xu, Yufei Zhang, Weike You, Linna Zhou

Abstract:As one of the prominent AI-generated content, Deepfake has raised significant safety concerns. Although it has been demonstrated that temporal consistency cues offer better generalization capability, existing methods based on CNNs inevitably introduce spatial bias, which hinders the extraction of intrinsic temporal features. To address this issue, we propose a novel method called Spatial Dependency Reduction (SDR), which integrates common temporal consistency features from multiple spatially-perturbed clusters, to reduce the dependency of the model on spatial information. Specifically, we design multiple Spatial Perturbation Branch (SPB) to construct spatially-perturbed feature clusters. Subsequently, we utilize the theory of mutual information and propose a Task-Relevant Feature Integration (TRFI) module to capture temporal features residing in similar latent space from these clusters. Finally, the integrated feature is fed into a temporal transformer to capture long-range dependencies. Extensive benchmarks and ablation studies demonstrate the effectiveness and rationale of our approach.

* 5 pages, 2 figures. Accepted to ICASSP 2025

Via

Access Paper or Ask Questions

FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error

Dec 10, 2024

Beilin Chu, Xuan Xu, Xin Wang, Yufei Zhang, Weike You, Linna Zhou

Abstract:The rapid advancement of diffusion models has significantly improved high-quality image generation, making generated content increasingly challenging to distinguish from real images and raising concerns about potential misuse. In this paper, we observe that diffusion models struggle to accurately reconstruct mid-band frequency information in real images, suggesting the limitation could serve as a cue for detecting diffusion model generated images. Motivated by this observation, we propose a novel method called Frequency-guided Reconstruction Error (FIRE), which, to the best of our knowledge, is the first to investigate the influence of frequency decomposition on reconstruction error. FIRE assesses the variation in reconstruction error before and after the frequency decomposition, offering a robust method for identifying diffusion model generated images. Extensive experiments show that FIRE generalizes effectively to unseen diffusion models and maintains robustness against diverse perturbations.

* 14 pages, 14 figures

Via

Access Paper or Ask Questions

Histo-Diffusion: A Diffusion Super-Resolution Method for Digital Pathology with Comprehensive Quality Assessment

Aug 27, 2024

Xuan Xu, Saarthak Kapse, Prateek Prasanna

Abstract:Digital pathology has advanced significantly over the last decade, with Whole Slide Images (WSIs) encompassing vast amounts of data essential for accurate disease diagnosis. High-resolution WSIs are essential for precise diagnosis but technical limitations in scanning equipment and variablity in slide preparation can hinder obtaining these images. Super-resolution techniques can enhance low-resolution images; while Generative Adversarial Networks (GANs) have been effective in natural image super-resolution tasks, they often struggle with histopathology due to overfitting and mode collapse. Traditional evaluation metrics fall short in assessing the complex characteristics of histopathology images, necessitating robust histology-specific evaluation methods. We introduce Histo-Diffusion, a novel diffusion-based method specially designed for generating and evaluating super-resolution images in digital pathology. It includes a restoration module for histopathology prior and a controllable diffusion module for generating high-quality images. We have curated two histopathology datasets and proposed a comprehensive evaluation strategy which incorporates both full-reference and no-reference metrics to thoroughly assess the quality of digital pathology images. Comparative analyses on multiple datasets with state-of-the-art methods reveal that Histo-Diffusion outperforms GANs. Our method offers a versatile solution for histopathology image super-resolution, capable of handling multi-resolution generation from varied input sizes, providing valuable support in diagnostic processes.

* We have submitted our paper to Medical Image Analysis and are currently awaiting feedback

Via

Access Paper or Ask Questions

Unearthing Common Inconsistency for Generalisable Deepfake Detection

Nov 20, 2023

Beilin Chu, Xuan Xu, Weike You, Linna Zhou

Abstract:Deepfake has emerged for several years, yet efficient detection techniques could generalize over different manipulation methods require further research. While current image-level detection method fails to generalize to unseen domains, owing to the domain-shift phenomenon brought by CNN's strong inductive bias towards Deepfake texture, video-level one shows its potential to have both generalization across multiple domains and robustness to compression. We argue that although distinct face manipulation tools have different inherent bias, they all disrupt the consistency between frames, which is a natural characteristic shared by authentic videos. Inspired by this, we proposed a detection approach by capturing frame inconsistency that broadly exists in different forgery techniques, termed unearthing-common-inconsistency (UCI). Concretely, the UCI network based on self-supervised contrastive learning can better distinguish temporal consistency between real and fake videos from multiple domains. We introduced a temporally-preserved module method to introduce spatial noise perturbations, directing the model's attention towards temporal information. Subsequently, leveraging a multi-view cross-correlation learning module, we extensively learn the disparities in temporal representations between genuine and fake samples. Extensive experiments demonstrate the generalization ability of our method on unseen Deepfake domains.

* 9 pages, 2 figures and 5 tables

Via

Access Paper or Ask Questions

ViT-DAE: Transformer-driven Diffusion Autoencoder for Histopathology Image Analysis

Apr 03, 2023

Xuan Xu, Saarthak Kapse, Rajarsi Gupta, Prateek Prasanna

Abstract:Generative AI has received substantial attention in recent years due to its ability to synthesize data that closely resembles the original data source. While Generative Adversarial Networks (GANs) have provided innovative approaches for histopathological image analysis, they suffer from limitations such as mode collapse and overfitting in discriminator. Recently, Denoising Diffusion models have demonstrated promising results in computer vision. These models exhibit superior stability during training, better distribution coverage, and produce high-quality diverse images. Additionally, they display a high degree of resilience to noise and perturbations, making them well-suited for use in digital pathology, where images commonly contain artifacts and exhibit significant variations in staining. In this paper, we present a novel approach, namely ViT-DAE, which integrates vision transformers (ViT) and diffusion autoencoders for high-quality histopathology image synthesis. This marks the first time that ViT has been introduced to diffusion autoencoders in computational pathology, allowing the model to better capture the complex and intricate details of histopathology images. We demonstrate the effectiveness of ViT-DAE on three publicly available datasets. Our approach outperforms recent GAN-based and vanilla DAE methods in generating realistic images.

* Submitted to MICCAI 2023

Via

Access Paper or Ask Questions

Enhancing Modality-Agnostic Representations via Meta-Learning for Brain Tumor Segmentation

Feb 08, 2023

Aishik Konwer, Xiaoling Hu, Xuan Xu, Joseph Bae, Chao Chen, Prateek Prasanna

Abstract:In the medical vision domain, different imaging modalities provide complementary information. However, in practice, not all modalities may be available during inference. Previous approaches, e.g., knowledge distillation or image synthesis, often assume the availability of full modalities for all patients during training; this is unrealistic and impractical owing to the variability in data collection across sites. We propose a novel approach to learn enhanced modality-agnostic representations by employing a novel meta-learning strategy in training, even when only a fraction of full modality patients are available. Meta-learning enhances partial modality representations to full modality representations by meta-training on partial modality data and meta-testing on limited full modality samples. Additionally, we co-supervise this feature enrichment by introducing an auxiliary adversarial learning branch. More specifically, a missing modality detector is used as a discriminator to mimic the full modality setting. Our segmentation framework significantly outperforms state-of-the-art brain tumor segmentation techniques in missing modality scenarios, as demonstrated on two brain tumor MRI datasets.

Via

Access Paper or Ask Questions

Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations

Mar 31, 2022

Aishik Konwer, Xuan Xu, Joseph Bae, Chao Chen, Prateek Prasanna

Figure 1 for Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations

Figure 2 for Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations

Figure 3 for Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations

Figure 4 for Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations

Abstract:Clinical outcome or severity prediction from medical images has largely focused on learning representations from single-timepoint or snapshot scans. It has been shown that disease progression can be better characterized by temporal imaging. We therefore hypothesized that outcome predictions can be improved by utilizing the disease progression information from sequential images. We present a deep learning approach that leverages temporal progression information to improve clinical outcome predictions from single-timepoint images. In our method, a self-attention based Temporal Convolutional Network (TCN) is used to learn a representation that is most reflective of the disease trajectory. Meanwhile, a Vision Transformer is pretrained in a self-supervised fashion to extract features from single-timepoint images. The key contribution is to design a recalibration module that employs maximum mean discrepancy loss (MMD) to align distributions of the above two contextual representations. We train our system to predict clinical outcomes and severity grades from single-timepoint images. Experiments on chest and osteoarthritis radiography datasets demonstrate that our approach outperforms other state-of-the-art techniques.

* Accepted in CVPR 2022 (ORAL)

Via

Access Paper or Ask Questions

Brain Cancer Survival Prediction on Treatment-na ive MRI using Deep Anchor Attention Learning with Vision Transformer

Feb 03, 2022

Xuan Xu, Prateek Prasanna

Figure 1 for Brain Cancer Survival Prediction on Treatment-na ive MRI using Deep Anchor Attention Learning with Vision Transformer

Figure 2 for Brain Cancer Survival Prediction on Treatment-na ive MRI using Deep Anchor Attention Learning with Vision Transformer

Figure 3 for Brain Cancer Survival Prediction on Treatment-na ive MRI using Deep Anchor Attention Learning with Vision Transformer

Figure 4 for Brain Cancer Survival Prediction on Treatment-na ive MRI using Deep Anchor Attention Learning with Vision Transformer

Abstract:Image-based brain cancer prediction models, based on radiomics, quantify the radiologic phenotype from magnetic resonance imaging (MRI). However, these features are difficult to reproduce because of variability in acquisition and preprocessing pipelines. Despite evidence of intra-tumor phenotypic heterogeneity, the spatial diversity between different slices within an MRI scan has been relatively unexplored using such methods. In this work, we propose a deep anchor attention aggregation strategy with a Vision Transformer to predict survival risk for brain cancer patients. A Deep Anchor Attention Learning (DAAL) algorithm is proposed to assign different weights to slice-level representations with trainable distance measurements. We evaluated our method on N = 326 MRIs. Our results outperformed attention multiple instance learning-based techniques. DAAL highlights the importance of critical slices and corroborates the clinical intuition that inter-slice spatial diversity can reflect disease severity and is implicated in outcome.

Via

Access Paper or Ask Questions

Large-Scale Data Mining of Rapid Residue Detection Assay Data From HTML and Documents: Improving Data Access and Visualization for Veterinarians

Dec 02, 2021

Majid Jaberi-Douraki, Soudabeh Taghian Dinani, Nuwan Indika Millagaha Gedara, Xuan Xu, Emily Richards, Fiona Maunsell, Nader Zad, Lisa Ann Tell

Figure 1 for Large-Scale Data Mining of Rapid Residue Detection Assay Data From HTML and Documents: Improving Data Access and Visualization for Veterinarians

Figure 2 for Large-Scale Data Mining of Rapid Residue Detection Assay Data From HTML and Documents: Improving Data Access and Visualization for Veterinarians

Figure 3 for Large-Scale Data Mining of Rapid Residue Detection Assay Data From HTML and Documents: Improving Data Access and Visualization for Veterinarians

Figure 4 for Large-Scale Data Mining of Rapid Residue Detection Assay Data From HTML and Documents: Improving Data Access and Visualization for Veterinarians

Abstract:Extra-label drug use in food animal medicine is authorized by the US Animal Medicinal Drug Use Clarification Act (AMDUCA), and estimated withdrawal intervals are based on published scientific pharmacokinetic data. Occasionally there is a paucity of scientific data on which to base a withdrawal interval or a large number of animals being treated, driving the need to test for drug residues. Rapid assay commercial farm-side tests are essential for monitoring drug residues in animal products to protect human health. Active ingredients, sensitivity, matrices, and species that have been evaluated for commercial rapid assay tests are typically reported on manufacturers' websites or in PDF documents that are available to consumers but may require a special access request. Additionally, this information is not always correlated with FDA-approved tolerances. Furthermore, parameter changes for these tests can be very challenging to regularly identify, especially those listed on websites or in documents that are not publicly available. Therefore, artificial intelligence plays a critical role in efficiently extracting the data and ensure current information. Extracting tables from PDF and HTML documents has been investigated both by academia and commercial tool builders. Research in text mining of such documents has become a widespread yet challenging arena in implementing natural language programming. However, techniques of extracting tables are still in their infancy and being investigated and improved by researchers. In this study, we developed and evaluated a data-mining method for automatically extracting rapid assay data from electronic documents. Our automatic electronic data extraction method includes a software package module, a developed pattern recognition tool, and a data mining engine. Assay details were provided by several commercial entities that produce these rapid drug residue assay

* 8, 2021, 13
* 13 pages, 7 figures

Via

Access Paper or Ask Questions

Deformable Kernel Convolutional Network for Video Extreme Super-Resolution

Oct 01, 2020

Xuan Xu, Xin Xiong, Jinge Wang, Xin Li

Abstract:Video super-resolution, which attempts to reconstruct high-resolution video frames from their corresponding low-resolution versions, has received increasingly more attention in recent years. Most existing approaches opt to use deformable convolution to temporally align neighboring frames and apply traditional spatial attention mechanism (convolution based) to enhance reconstructed features. However, such spatial-only strategies cannot fully utilize temporal dependency among video frames. In this paper, we propose a novel deep learning based VSR algorithm, named Deformable Kernel Spatial Attention Network (DKSAN). Thanks to newly designed Deformable Kernel Convolution Alignment (DKC_Align) and Deformable Kernel Spatial Attention (DKSA) modules, DKSAN can better exploit both spatial and temporal redundancies to facilitate the information propagation across different layers. We have tested DKSAN on AIM2020 Video Extreme Super-Resolution Challenge to super-resolve videos with a scale factor as large as 16. Experimental results demonstrate that our proposed DKSAN can achieve both better subjective and objective performance compared with the existing state-of-the-art EDVR on Vid3oC and IntVID datasets.

* To appear in ECCVW 2020

Via

Access Paper or Ask Questions