Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

You Zhang

Zero-shot Medical Image Translation via Frequency-Guided Diffusion Models

Apr 05, 2023
Yunxiang Li, Hua-Chieh Shao, Xiao Liang, Liyuan Chen, Ruiqi Li, Steve Jiang, Jing Wang, You Zhang

Figure 1 for Zero-shot Medical Image Translation via Frequency-Guided Diffusion Models

Figure 2 for Zero-shot Medical Image Translation via Frequency-Guided Diffusion Models

Figure 3 for Zero-shot Medical Image Translation via Frequency-Guided Diffusion Models

Figure 4 for Zero-shot Medical Image Translation via Frequency-Guided Diffusion Models

Recently, the diffusion model has emerged as a superior generative model that can produce high-quality images with excellent realism. There is a growing interest in applying diffusion models to image translation tasks. However, for medical image translation, the existing diffusion models are deficient in accurately retaining structural information since the structure details of source domain images are lost during the forward diffusion process and cannot be fully recovered through learned reverse diffusion, while the integrity of anatomical structures is extremely important in medical images. Training and conditioning diffusion models using paired source and target images with matching anatomy can help. However, such paired data are very difficult and costly to obtain, and may also reduce the robustness of the developed model to out-of-distribution testing data. We propose a frequency-guided diffusion model (FGDM) that employs frequency-domain filters to guide the diffusion model for structure-preserving image translation. Based on its design, FGDM allows zero-shot learning, as it can be trained solely on the data from the target domain, and used directly for source-to-target domain translation without any exposure to the source-domain data during training. We trained FGDM solely on the head-and-neck CT data, and evaluated it on both head-and-neck and lung cone-beam CT (CBCT)-to-CT translation tasks. FGDM outperformed the state-of-the-art methods (GAN-based, VAE-based, and diffusion-based) in all metrics, showing its significant advantages in zero-shot medical image translation.

Via

Access Paper or Ask Questions

ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge

Apr 01, 2023
Yunxiang Li, Zihan Li, Kai Zhang, Ruilong Dan, You Zhang

Figure 1 for ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge

Figure 2 for ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge

Figure 3 for ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge

Recent large language models (LLMs) in the general domain, such as ChatGPT, have shown remarkable success in following instructions and producing human-like responses. However, such language models have not been tailored to the medical domain, resulting in poor answer accuracy and inability to give plausible recommendations for medical diagnosis, medications, etc. To address this issue, we collected more than 700 diseases and their corresponding symptoms, required medical tests, and recommended medications, from which we generated 5K doctor-patient conversations. In addition, we obtained 200K real patient-doctor conversations from online Q\&A medical consultation sites. By fine-tuning LLMs using these 205k doctor-patient conversations, the resulting models emerge with great potential to understand patients' needs, provide informed advice, and offer valuable assistance in a variety of medical-related fields. The integration of these advanced language models into healthcare can revolutionize the way healthcare professionals and patients communicate, ultimately improving the overall efficiency and quality of patient care and outcomes. In addition, we made public all the source codes, datasets, and model weights to facilitate the further development of dialogue models in the medical field. The training data, codes, and weights of this project are available at: The training data, codes, and weights of this project are available at: https://github.com/Kent0n-Li/ChatDoctor.

Via

Access Paper or Ask Questions

SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

Nov 04, 2022
Siwen Ding, You Zhang, Zhiyao Duan

Figure 1 for SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

Figure 2 for SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

Figure 3 for SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

Figure 4 for SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

Voice anti-spoofing systems are crucial auxiliaries for automatic speaker verification (ASV) systems. A major challenge is caused by unseen attacks empowered by advanced speech synthesis technologies. Our previous research on one-class learning has improved the generalization ability to unseen attacks by compacting the bona fide speech in the embedding space. However, such compactness lacks consideration of the diversity of speakers. In this work, we propose speaker attractor multi-center one-class learning (SAMO), which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space. For training, we propose an algorithm for the co-optimization of bona fide speech clustering and bona fide/spoof classification. For inference, we propose strategies to enable anti-spoofing for speakers without enrollment. Our proposed system outperforms existing state-of-the-art single systems with a relative improvement of 38% on equal error rate (EER) on the ASVspoof2019 LA evaluation set.

Via

Access Paper or Ask Questions

HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields

Oct 30, 2022
You Zhang, Yuxiang Wang, Zhiyao Duan

Figure 1 for HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields

Figure 2 for HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields

Figure 3 for HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields

Figure 4 for HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields

Head-related transfer functions (HRTFs) are a set of functions describing the spatial filtering effect of the outer ear (i.e., torso, head, and pinnae) onto sound sources at different azimuth and elevation angles. They are widely used in spatial audio rendering. While the azimuth and elevation angles are intrinsically continuous, measured HRTFs in existing datasets employ different spatial sampling schemes, making it difficult to model HRTFs across datasets. In this work, we propose to use neural fields, a differentiable representation of functions through neural networks, to model HRTFs with arbitrary spatial sampling schemes. Such representation is unified across datasets with different spatial sampling schemes. HRTFs for arbitrary azimuth and elevation angles can be derived from this representation. We further introduce a generative model named HRTF field to learn the latent space of the HRTF neural fields across subjects. We demonstrate promising performance on HRTF interpolation and generation tasks and point out potential future work.

* 5 pages, submitted to ICASSP 2023

Via

Access Paper or Ask Questions

Recurrence-free Survival Prediction under the Guidance of Automatic Gross Tumor Volume Segmentation for Head and Neck Cancers

Sep 22, 2022
Kai Wang, Yunxiang Li, Michael Dohopolski, Tao Peng, Weiguo Lu, You Zhang, Jing Wang

Figure 1 for Recurrence-free Survival Prediction under the Guidance of Automatic Gross Tumor Volume Segmentation for Head and Neck Cancers

Figure 2 for Recurrence-free Survival Prediction under the Guidance of Automatic Gross Tumor Volume Segmentation for Head and Neck Cancers

Figure 3 for Recurrence-free Survival Prediction under the Guidance of Automatic Gross Tumor Volume Segmentation for Head and Neck Cancers

Figure 4 for Recurrence-free Survival Prediction under the Guidance of Automatic Gross Tumor Volume Segmentation for Head and Neck Cancers

For Head and Neck Cancers (HNC) patient management, automatic gross tumor volume (GTV) segmentation and accurate pre-treatment cancer recurrence prediction are of great importance to assist physicians in designing personalized management plans, which have the potential to improve the treatment outcome and quality of life for HNC patients. In this paper, we developed an automated primary tumor (GTVp) and lymph nodes (GTVn) segmentation method based on combined pre-treatment positron emission tomography/computed tomography (PET/CT) scans of HNC patients. We extracted radiomics features from the segmented tumor volume and constructed a multi-modality tumor recurrence-free survival (RFS) prediction model, which fused the prediction results from separate CT radiomics, PET radiomics, and clinical models. We performed 5-fold cross-validation to train and evaluate our methods on the MICCAI 2022 HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR) dataset. The ensemble prediction results on the testing cohort achieved Dice scores of 0.77 and 0.73 for GTVp and GTVn segmentation, respectively, and a C-index value of 0.67 for RFS prediction. The code is publicly available (https://github.com/wangkaiwan/HECKTOR-2022-AIRT). Our team's name is AIRT.

* MICCAI 2022, HECKTOR Challenge Submission

Via

Access Paper or Ask Questions

Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations

Jul 28, 2022
Yuxiang Wang, You Zhang, Zhiyao Duan, Mark Bocko

Figure 1 for Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations

Figure 2 for Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations

Figure 3 for Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations

Figure 4 for Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations

In the growing field of virtual auditory display, personalized head-related transfer functions (HRTFs) play a vital role in establishing an accurate sound image. In this work, we propose an HRTF personalization method employing convolutional neural networks (CNN) to predict a subject's HRTFs for all directions from their scanned head geometry. To ease the training of the CNN models, we propose novel pre-processing methods for both the head scans and HRTF data to achieve compact representations. For the head scan, we use truncated spherical cap harmonic (SCH) coefficients to represent the pinna area, which is important in the acoustic scattering process. For the HRTF data, we use truncated spherical harmonic (SH) coefficients to represent the HRTF magnitudes and onsets. One CNN model is trained to predict the SH coefficients of the HRTF magnitudes from the SCH coefficients of the scanned ear geometry and other anthropometric measurements of the head. The other CNN model is trained to predict SH coefficients of the HRTF onsets from only the anthropometric measurements of the ear, head, and torso. Combining the magnitude and onset predictions, our method is able to predict the complete and global HRTF data. A leave-one-out validation with the log-spectral distortion (LSD) metric is used for objective evaluation. The results show a decent LSD level at both spatial \& temporal dimensions compared to the ground-truth HRTFs and a lower LSD than the boundary element method (BEM) simulation of HRTFs that the database provides. The localization simulation results with an auditory model are also consistent with the objective evaluation metrics, showing the localization responses with our predicted HRTFs are significantly better than with the BEM calculated ones.

* 11 pages, 14 figures

Via

Access Paper or Ask Questions

LViT: Language meets Vision Transformer in Medical Image Segmentation

Jun 29, 2022
Zihan Li, Yunxiang Li, Qingde Li, You Zhang, Puyang Wang, Dazhou Guo, Le Lu, Dakai Jin, Qingqi Hong

Figure 1 for LViT: Language meets Vision Transformer in Medical Image Segmentation

Figure 2 for LViT: Language meets Vision Transformer in Medical Image Segmentation

Figure 3 for LViT: Language meets Vision Transformer in Medical Image Segmentation

Figure 4 for LViT: Language meets Vision Transformer in Medical Image Segmentation

Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient number of high-quality data with the high cost of data annotation. To overcome the limitation, we propose a new vision-language medical image segmentation model LViT (Language meets Vision Transformer). In our model, medical text annotation is introduced to compensate for the quality deficiency in image data. In addition, the text information can guide the generation of pseudo labels to a certain extent and further guarantee the quality of pseudo labels in semi-supervised learning. We also propose the Exponential Pseudo label Iteration mechanism (EPI) to help extend the semi-supervised version of LViT and the Pixel-Level Attention Module (PLAM) to preserve local features of images. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. To validate the performance of LViT, we construct multimodal medical segmentation datasets (image + text) containing pathological images, X-rays,etc. Experimental results show that our proposed LViT has better segmentation performance in both fully and semi-supervised conditions. Code and datasets are available at https://github.com/HUANGLIZI/LViT.

Via

Access Paper or Ask Questions

Rethinking Audio-visual Synchronization for Active Speaker Detection

Jun 21, 2022
Abudukelimu Wuerkaixi, You Zhang, Zhiyao Duan, Changshui Zhang

Figure 1 for Rethinking Audio-visual Synchronization for Active Speaker Detection

Figure 2 for Rethinking Audio-visual Synchronization for Active Speaker Detection

Figure 3 for Rethinking Audio-visual Synchronization for Active Speaker Detection

Figure 4 for Rethinking Audio-visual Synchronization for Active Speaker Detection

Active speaker detection (ASD) systems are important modules for analyzing multi-talker conversations. They aim to detect which speakers or none are talking in a visual scene at any given time. Existing research on ASD does not agree on the definition of active speakers. We clarify the definition in this work and require synchronization between the audio and visual speaking activities. This clarification of definition is motivated by our extensive experiments, through which we discover that existing ASD methods fail in modeling the audio-visual synchronization and often classify unsynchronized videos as active speaking. To address this problem, we propose a cross-modal contrastive learning strategy and apply positional encoding in attention modules for supervised ASD models to leverage the synchronization cue. Experimental results suggest that our model can successfully detect unsynchronized speaking as not speaking, addressing the limitation of current models.

* Accepted by IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2022)

Via

Access Paper or Ask Questions

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

Mar 02, 2022
You Zhang, Ge Zhu, Zhiyao Duan

Figure 1 for A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

Figure 2 for A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

Figure 3 for A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

Figure 4 for A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

The performance of automatic speaker verification (ASV) systems could be degraded by voice spoofing attacks. Most existing works aimed to develop standalone spoofing countermeasure (CM) systems. Relatively little work targeted at developing an integrated spoofing aware speaker verification (SASV) system. In the recent SASV challenge, the organizers encourage the development of such integration by releasing official protocols and baselines. In this paper, we build a probabilistic framework for fusing the ASV and CM subsystem scores. We further propose fusion strategies for direct inference and fine-tuning to predict the SASV score based on the framework. Surprisingly, these strategies significantly improve the SASV equal error rate (EER) from 19.31% of the baseline to 1.53% on the official evaluation trials of the SASV challenge. We verify the effectiveness of our proposed components through ablation studies and provide insights with score distribution analysis.

* 8 pages, 5 figures, submitted to Odyssey 2022

Via

Access Paper or Ask Questions

A New Fusion Strategy for Spoofing Aware Speaker Verification

Feb 10, 2022
You Zhang, Ge Zhu, Zhiyao Duan

Figure 1 for A New Fusion Strategy for Spoofing Aware Speaker Verification

Figure 2 for A New Fusion Strategy for Spoofing Aware Speaker Verification

Figure 3 for A New Fusion Strategy for Spoofing Aware Speaker Verification

Figure 4 for A New Fusion Strategy for Spoofing Aware Speaker Verification

The performance of automatic speaker verification (ASV) systems could be degraded by voice spoofing attacks. Most existing works aimed to develop standalone spoofing countermeasure (CM) systems. Relatively little work aimed to develop an integrated spoofing aware speaker verification (SASV) system. With the recent SASV challenge aiming to encourage the development of such integration, official protocols and baselines have been released by the organizers. Building on these baselines, we propose a score scaling and multiplication strategy for inference and an SASV training strategy. Surprisingly, these strategies significantly improve the SASV equal error rate (EER) from 19.31\% of the best baseline to 1.58\% on the official evaluation trials of the SASV challenge. We verify the effectiveness of our proposed components through ablation studies and provide insights with score distribution analyses.

* Work in Progress

Via

Access Paper or Ask Questions