Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sheng Li

University of Pittsburgh

From Covert Hiding to Visual Editing: Robust Generative Video Steganography

Jan 01, 2024

Xueying Mao, Xiaoxiao Hu, Wanli Peng, Zhenliang Gan, Qichao Ying, Zhenxing Qian, Sheng Li, Xinpeng Zhang

Figure 1 for From Covert Hiding to Visual Editing: Robust Generative Video Steganography

Figure 2 for From Covert Hiding to Visual Editing: Robust Generative Video Steganography

Figure 3 for From Covert Hiding to Visual Editing: Robust Generative Video Steganography

Figure 4 for From Covert Hiding to Visual Editing: Robust Generative Video Steganography

Abstract:Traditional video steganography methods are based on modifying the covert space for embedding, whereas we propose an innovative approach that embeds secret message within semantic feature for steganography during the video editing process. Although existing traditional video steganography methods display a certain level of security and embedding capacity, they lack adequate robustness against common distortions in online social networks (OSNs). In this paper, we introduce an end-to-end robust generative video steganography network (RoGVS), which achieves visual editing by modifying semantic feature of videos to embed secret message. We employ face-swapping scenario to showcase the visual editing effects. We first design a secret message embedding module to adaptively hide secret message into the semantic feature of videos. Extensive experiments display that the proposed RoGVS method applied to facial video datasets demonstrate its superiority over existing video and image steganography techniques in terms of both robustness and capacity.

* Under Review

Via

Access Paper or Ask Questions

PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation Models Through Prompt Tuning

Jan 01, 2024

Xuntao Liu, Yuzhou Yang, Qichao Ying, Zhenxing Qian, Xinpeng Zhang, Sheng Li

Figure 1 for PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation Models Through Prompt Tuning

Figure 2 for PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation Models Through Prompt Tuning

Figure 3 for PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation Models Through Prompt Tuning

Figure 4 for PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation Models Through Prompt Tuning

Abstract:Deceptive images can be shared in seconds with social networking services, posing substantial risks. Tampering traces, such as boundary artifacts and high-frequency information, have been significantly emphasized by massive networks in the Image Manipulation Localization (IML) field. However, they are prone to image post-processing operations, which limit the generalization and robustness of existing methods. We present a novel Prompt-IML framework. We observe that humans tend to discern the authenticity of an image based on both semantic and high-frequency information, inspired by which, the proposed framework leverages rich semantic knowledge from pre-trained visual foundation models to assist IML. We are the first to design a framework that utilizes visual foundation models specially for the IML task. Moreover, we design a Feature Alignment and Fusion module to align and fuse features of semantic features with high-frequency features, which aims at locating tampered regions from multiple perspectives. Experimental results demonstrate that our model can achieve better performance on eight typical fake image datasets and outstanding robustness.

* Under Review

Via

Access Paper or Ask Questions

Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction

Dec 20, 2023

Zhixuan Chu, Mengxuan Hu, Qing Cui, Longfei Li, Sheng Li

Figure 1 for Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction

Figure 2 for Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction

Figure 3 for Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction

Figure 4 for Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction

Abstract:Since artificial intelligence has seen tremendous recent successes in many areas, it has sparked great interest in its potential for trustworthy and interpretable risk prediction. However, most models lack causal reasoning and struggle with class imbalance, leading to poor precision and recall. To address this, we propose a Task-Driven Causal Feature Distillation model (TDCFD) to transform original feature values into causal feature attributions for the specific risk prediction task. The causal feature attribution helps describe how much contribution the value of this feature can make to the risk prediction result. After the causal feature distillation, a deep neural network is applied to produce trustworthy prediction results with causal interpretability and high precision/recall. We evaluate the performance of our TDCFD method on several synthetic and real datasets, and the results demonstrate its superiority over the state-of-the-art methods regarding precision, recall, interpretability, and causality.

* Proceedings of the 2024 AAAI Conference on Artificial Intelligence

Via

Access Paper or Ask Questions

XAI meets Biology: A Comprehensive Review of Explainable AI in Bioinformatics Applications

Dec 11, 2023

Zhongliang Zhou, Mengxuan Hu, Mariah Salcedo, Nathan Gravel, Wayland Yeung, Aarya Venkat, Dongliang Guo, Jielu Zhang, Natarajan Kannan, Sheng Li

Figure 1 for XAI meets Biology: A Comprehensive Review of Explainable AI in Bioinformatics Applications

Figure 2 for XAI meets Biology: A Comprehensive Review of Explainable AI in Bioinformatics Applications

Figure 3 for XAI meets Biology: A Comprehensive Review of Explainable AI in Bioinformatics Applications

Figure 4 for XAI meets Biology: A Comprehensive Review of Explainable AI in Bioinformatics Applications

Abstract:Artificial intelligence (AI), particularly machine learning and deep learning models, has significantly impacted bioinformatics research by offering powerful tools for analyzing complex biological data. However, the lack of interpretability and transparency of these models presents challenges in leveraging these models for deeper biological insights and for generating testable hypotheses. Explainable AI (XAI) has emerged as a promising solution to enhance the transparency and interpretability of AI models in bioinformatics. This review provides a comprehensive analysis of various XAI techniques and their applications across various bioinformatics domains including DNA, RNA, and protein sequence analysis, structural analysis, gene expression and genome analysis, and bioimaging analysis. We introduce the most pertinent machine learning and XAI methods, then discuss their diverse applications and address the current limitations of available XAI tools. By offering insights into XAI's potential and challenges, this review aims to facilitate its practical implementation in bioinformatics research and help researchers navigate the landscape of XAI tools.

* 19 pages, 9 figures

Via

Access Paper or Ask Questions

Noisy probing dose facilitated dose prediction for pencil beam scanning proton therapy: physics enhances generalizability

Dec 02, 2023

Lian Zhang, Jason M. Holmes, Zhengliang Liu, Hongying Feng, Terence T. Sio, Carlos E. Vargas, Sameer R. Keole, Kristin Stützer, Sheng Li, Tianming Liu(+4 more)

Figure 1 for Noisy probing dose facilitated dose prediction for pencil beam scanning proton therapy: physics enhances generalizability

Figure 2 for Noisy probing dose facilitated dose prediction for pencil beam scanning proton therapy: physics enhances generalizability

Figure 3 for Noisy probing dose facilitated dose prediction for pencil beam scanning proton therapy: physics enhances generalizability

Figure 4 for Noisy probing dose facilitated dose prediction for pencil beam scanning proton therapy: physics enhances generalizability

Abstract:Purpose: Prior AI-based dose prediction studies in photon and proton therapy often neglect underlying physics, limiting their generalizability to handle outlier clinical cases, especially for pencil beam scanning proton therapy (PBSPT). Our aim is to design a physics-aware and generalizable AI-based PBSPT dose prediction method that has the underlying physics considered to achieve high generalizability to properly handle the outlier clinical cases. Methods and Materials: This study analyzed PBSPT plans of 103 prostate and 78 lung cancer patients from our institution,with each case comprising CT images, structure sets, and plan doses from our Monte-Carlo dose engine (serving as the ground truth). Three methods were evaluated in the ablation study: the ROI-based method, the beam mask and sliding window method, and the noisy probing dose method. Twelve cases with uncommon beam angles or prescription doses tested the methods' generalizability to rare treatment planning scenarios. Performance evaluation used DVH indices, 3D Gamma passing rates (3%/2mm/10%), and dice coefficients for dose agreement. Results: The noisy probing dose method showed improved agreement of DVH indices, 3D Gamma passing rates, and dice coefficients compared to the conventional methods for the testing cases. The noisy probing dose method showed better generalizability in the 6 outlier cases than the ROI-based and beam mask-based methods with 3D Gamma passing rates (for prostate cancer, targets: 89.32%$\pm$1.45% vs. 93.48%$\pm$1.51% vs. 96.79%$\pm$0.83%, OARs: 85.87%$\pm$1.73% vs. 91.15%$\pm$1.13% vs. 94.29%$\pm$1.01%). The dose predictions were completed within 0.3 seconds. Conclusions: We've devised a novel noisy probing dose method for PBSPT dose prediction in prostate and lung cancer patients. With more physics included, it enhances the generalizability of dose prediction in handling outlier clinical cases.

Via

Access Paper or Ask Questions

Steal My Artworks for Fine-tuning? A Watermarking Framework for Detecting Art Theft Mimicry in Text-to-Image Models

Nov 22, 2023

Ge Luo, Junqiang Huang, Manman Zhang, Zhenxing Qian, Sheng Li, Xinpeng Zhang

Abstract:The advancement in text-to-image models has led to astonishing artistic performances. However, several studios and websites illegally fine-tune these models using artists' artworks to mimic their styles for profit, which violates the copyrights of artists and diminishes their motivation to produce original works. Currently, there is a notable lack of research focusing on this issue. In this paper, we propose a novel watermarking framework that detects mimicry in text-to-image models through fine-tuning. This framework embeds subtle watermarks into digital artworks to protect their copyrights while still preserving the artist's visual expression. If someone takes watermarked artworks as training data to mimic an artist's style, these watermarks can serve as detectable indicators. By analyzing the distribution of these watermarks in a series of generated images, acts of fine-tuning mimicry using stolen victim data will be exposed. In various fine-tune scenarios and against watermark attack methods, our research confirms that analyzing the distribution of watermarks in artificially generated images reliably detects unauthorized mimicry.

* A Watermarking Framework for Detecting Art Theft Mimicry in Text-to-Image Models

Via

Access Paper or Ask Questions

FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer's Speech Detection

Nov 21, 2023

Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li

Abstract:The early-stage Alzheimer's disease (AD) detection has been considered an important field of medical studies. Like traditional machine learning methods, speech-based automatic detection also suffers from data privacy risks because the data of specific patients are exclusive to each medical institution. A common practice is to use federated learning to protect the patients' data privacy. However, its distributed learning process also causes performance reduction. To alleviate this problem while protecting user privacy, we propose a federated contrastive pre-training (FedCPC) performed before federated training for AD speech detection, which can learn a better representation from raw data and enables different clients to share data in the pre-training and training stages. Experimental results demonstrate that the proposed methods can achieve satisfactory performance while preserving data privacy.

* accepted in IEEE-ASRU2023

Via

Access Paper or Ask Questions

Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization

Nov 17, 2023

Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He

Figure 1 for Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization

Figure 2 for Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization

Figure 3 for Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization

Figure 4 for Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization

Abstract:Current speaker anonymization methods, especially with self-supervised learning (SSL) models, require massive computational resources when hiding speaker identity. This paper proposes an effective and parameter-efficient speaker anonymization method based on recent End-to-End model reprogramming technology. To improve the anonymization performance, we first extract speaker representation from large SSL models as the speaker identifies. To hide the speaker's identity, we reprogram the speaker representation by adapting the speaker to a pseudo domain. Extensive experiments are carried out on the VoicePrivacy Challenge (VPC) 2022 datasets to demonstrate the effectiveness of our proposed parameter-efficient learning anonymization methods. Additionally, while achieving comparable performance with the VPC 2022 strong baseline 1.b, our approach consumes less computational resources during anonymization.

* accepted in ACM Multimedia Asia2023

Via

Access Paper or Ask Questions

GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System

Nov 17, 2023

Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He

Figure 1 for GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System

Figure 2 for GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System

Figure 3 for GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System

Figure 4 for GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System

Abstract:Speaker adaptation systems face privacy concerns, for such systems are trained on private datasets and often overfitting. This paper demonstrates that an attacker can extract speaker information by querying speaker-adapted speech recognition (ASR) systems. We focus on the speaker information of a transformer-based ASR and propose GhostVec, a simple and efficient attack method to extract the speaker information from an encoder-decoder-based ASR system without any external speaker verification system or natural human voice as a reference. To make our results quantitative, we pre-process GhostVec using singular value decomposition (SVD) and synthesize it into waveform. Experiment results show that the synthesized audio of GhostVec reaches 10.83\% EER and 0.47 minDCF with target speakers, which suggests the effectiveness of the proposed method. We hope the preliminary discovery in this study to catalyze future speech recognition research on privacy-preserving topics.

* accepted in ACM Multimedia Asia 2023

Via

Access Paper or Ask Questions

LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

Nov 17, 2023

Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu

Figure 1 for LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

Figure 2 for LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

Figure 3 for LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

Figure 4 for LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

Abstract:Recently, researchers have shown an increasing interest in automatically predicting the subjective evaluation for speech synthesis systems. This prediction is a challenging task, especially on the out-of-domain test set. In this paper, we proposed a novel fusion model for MOS prediction that combines supervised and unsupervised approaches. In the supervised aspect, we developed an SSL-based predictor called LE-SSL-MOS. The LE-SSL-MOS utilizes pre-trained self-supervised learning models and further improves prediction accuracy by utilizing the opinion scores of each utterance in the listener enhancement branch. In the unsupervised aspect, two steps are contained: we fine-tuned the unit language model (ULM) using highly intelligible domain data to improve the correlation of an unsupervised metric - SpeechLMScore. Another is that we utilized ASR confidence as a new metric with the help of ensemble learning. To our knowledge, this is the first architecture that fuses supervised and unsupervised methods for MOS prediction. With these approaches, our experimental results on the VoiceMOS Challenge 2023 show that LE-SSL-MOS performs better than the baseline. Our fusion system achieved an absolute improvement of 13% over LE-SSL-MOS on the noisy and enhanced speech track. Our system ranked 1st and 2nd, respectively, in the French speech synthesis track and the challenge's noisy and enhanced speech track.

* accepted in IEEE-ASRU2023

Via

Access Paper or Ask Questions