Alert button
Picture for Maureen Stone

Maureen Stone

Alert button

Attentive Continuous Generative Self-training for Unsupervised Domain Adaptive Medical Image Translation

May 23, 2023
Xiaofeng Liu, Jerry L. Prince, Fangxu Xing, Jiachen Zhuo, Reese Timothy, Maureen Stone, Georges El Fakhri, Jonghye Woo

Figure 1 for Attentive Continuous Generative Self-training for Unsupervised Domain Adaptive Medical Image Translation
Figure 2 for Attentive Continuous Generative Self-training for Unsupervised Domain Adaptive Medical Image Translation
Figure 3 for Attentive Continuous Generative Self-training for Unsupervised Domain Adaptive Medical Image Translation
Figure 4 for Attentive Continuous Generative Self-training for Unsupervised Domain Adaptive Medical Image Translation

Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseudo-label filtering based on the maximum softmax probability, there is a paucity of prior work on self-training-based UDA for generative tasks, including image modality translation. To fill this gap, in this work, we seek to develop a generative self-training (GST) framework for domain adaptive image translation with continuous value prediction and regression objectives. Specifically, we quantify both aleatoric and epistemic uncertainties within our GST using variational Bayes learning to measure the reliability of synthesized data. We also introduce a self-attention scheme that de-emphasizes the background region to prevent it from dominating the training process. The adaptation is then carried out by an alternating optimization scheme with target domain supervision that focuses attention on the regions with reliable pseudo-labels. We evaluated our framework on two cross-scanner/center, inter-subject translation tasks, including tagged-to-cine magnetic resonance (MR) image translation and T1-weighted MR-to-fractional anisotropy translation. Extensive validations with unpaired target domain data showed that our GST yielded superior synthesis performance in comparison to adversarial training UDA methods.

* Accepted to Medical Image Analysis 
Viaarxiv icon

Synthesizing audio from tongue motion during speech using tagged MRI via transformer

Feb 14, 2023
Xiaofeng Liu, Fangxu Xing, Jerry L. Prince, Maureen Stone, Georges El Fakhri, Jonghye Woo

Figure 1 for Synthesizing audio from tongue motion during speech using tagged MRI via transformer
Figure 2 for Synthesizing audio from tongue motion during speech using tagged MRI via transformer
Figure 3 for Synthesizing audio from tongue motion during speech using tagged MRI via transformer

Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.

* SPIE Medical Imaging: Deep Dive Oral 
Viaarxiv icon

New starting point registration method for tagged MRI tongue motion estimation

Feb 08, 2023
Jinglun Yu, Muhan Shao, Zhangxing Bian, Xiao Liang, Jiachen Zhuo, Maureen Stone, Jerry L. Prince

Figure 1 for New starting point registration method for tagged MRI tongue motion estimation
Figure 2 for New starting point registration method for tagged MRI tongue motion estimation
Figure 3 for New starting point registration method for tagged MRI tongue motion estimation
Figure 4 for New starting point registration method for tagged MRI tongue motion estimation

Accurate tongue motion estimation is essential for tongue function evaluation. The harmonic phase processing (HARP) method and the phase vector incompressible registration algorithm (PVIRA) based on HARP can generate motion estimates from tagged MRI images, but they suffer from tag jumping due to large motions. This paper proposes a new registration method by combining the stationary velocity fields produced by PVIRA between successive time frames as a new initialization of the final registration stage to avoid tag jumping. The experiment results demonstrate the proposed method can avoid tag jumping and outperform the existing methods on tongue motion estimates.

Viaarxiv icon

Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator

Jun 09, 2022
Xiaofeng Liu, Fangxu Xing, Jerry L. Prince, Jiachen Zhuo, Maureen Stone, Georges El Fakhri, Jonghye Woo

Figure 1 for Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator
Figure 2 for Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator
Figure 3 for Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator
Figure 4 for Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator

Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities -- i.e., two-dimensional (mid-sagittal slice) plus time tagged-MRI sequence and its corresponding one-dimensional waveform -- is not straightforward. Instead, we resort to two-dimensional spectrograms as an intermediate representation, which contains both pitch and resonance, from which to develop an end-to-end deep learning framework to translate from a sequence of tagged-MRI to its corresponding audio waveform with limited dataset size.~Our framework is based on a novel fully convolutional asymmetry translator with guidance of a self residual attention strategy to specifically exploit the moving muscular structures during speech.~In addition, we leverage a pairwise correlation of the samples with the same utterances with a latent space representation disentanglement strategy.~Furthermore, we incorporate an adversarial training approach with generative adversarial networks to offer improved realism on our generated spectrograms.~Our experimental results, carried out with a total of 63 tagged-MRI sequences alongside speech acoustics, showed that our framework enabled the generation of clear audio waveforms from a sequence of tagged-MRI, surpassing competing methods. Thus, our framework provides the great potential to help better understand the relationship between the two modalities.

* MICCAI 2022 (early accept) 
Viaarxiv icon

Structure-aware Unsupervised Tagged-to-Cine MRI Synthesis with Self Disentanglement

Feb 25, 2022
Xiaofeng Liu, Fangxu Xing, Jerry L. Prince, Maureen Stone, Georges El Fakhri, Jonghye Woo

Figure 1 for Structure-aware Unsupervised Tagged-to-Cine MRI Synthesis with Self Disentanglement
Figure 2 for Structure-aware Unsupervised Tagged-to-Cine MRI Synthesis with Self Disentanglement
Figure 3 for Structure-aware Unsupervised Tagged-to-Cine MRI Synthesis with Self Disentanglement

Cycle reconstruction regularized adversarial training -- e.g., CycleGAN, DiscoGAN, and DualGAN -- has been widely used for image style transfer with unpaired training data. Several recent works, however, have shown that local distortions are frequent, and structural consistency cannot be guaranteed. Targeting this issue, prior works usually relied on additional segmentation or consistent feature extraction steps that are task-specific. To counter this, this work aims to learn a general add-on structural feature extractor, by explicitly enforcing the structural alignment between an input and its synthesized image. Specifically, we propose a novel input-output image patches self-training scheme to achieve a disentanglement of underlying anatomical structures and imaging modalities. The translator and structure encoder are updated, following an alternating training protocol. In addition, the information w.r.t. imaging modality can be eliminated with an asymmetric adversarial game. We train, validate, and test our network on 1,768, 416, and 1,560 unpaired subject-independent slices of tagged and cine magnetic resonance imaging from a total of twenty healthy subjects, respectively, demonstrating superior performance over competing methods.

* SPIE Medical Imaging: Image Processing (Oral presentation) 
Viaarxiv icon

Generative Self-training for Cross-domain Unsupervised Tagged-to-Cine MRI Synthesis

Jun 23, 2021
Xiaofeng Liu, Fangxu Xing, Maureen Stone, Jiachen Zhuo, Reese Timothy, Jerry L. Prince, Georges El Fakhri, Jonghye Woo

Figure 1 for Generative Self-training for Cross-domain Unsupervised Tagged-to-Cine MRI Synthesis
Figure 2 for Generative Self-training for Cross-domain Unsupervised Tagged-to-Cine MRI Synthesis
Figure 3 for Generative Self-training for Cross-domain Unsupervised Tagged-to-Cine MRI Synthesis
Figure 4 for Generative Self-training for Cross-domain Unsupervised Tagged-to-Cine MRI Synthesis

Self-training based unsupervised domain adaptation (UDA) has shown great potential to address the problem of domain shift, when applying a trained deep learning model in a source domain to unlabeled target domains. However, while the self-training UDA has demonstrated its effectiveness on discriminative tasks, such as classification and segmentation, via the reliable pseudo-label selection based on the softmax discrete histogram, the self-training UDA for generative tasks, such as image synthesis, is not fully investigated. In this work, we propose a novel generative self-training (GST) UDA framework with continuous value prediction and regression objective for cross-domain image synthesis. Specifically, we propose to filter the pseudo-label with an uncertainty mask, and quantify the predictive confidence of generated images with practical variational Bayes learning. The fast test-time adaptation is achieved by a round-based alternative optimization scheme. We validated our framework on the tagged-to-cine magnetic resonance imaging (MRI) synthesis problem, where datasets in the source and target domains were acquired from different scanners or centers. Extensive validations were carried out to verify our framework against popular adversarial training UDA methods. Results show that our GST, with tagged MRI of test subjects in new target domains, improved the synthesis quality by a large margin, compared with the adversarial training UDA methods.

* MICCAI 2021 (early accept <13%) 
Viaarxiv icon

Dual-cycle Constrained Bijective VAE-GAN For Tagged-to-Cine Magnetic Resonance Image Synthesis

Jan 14, 2021
Xiaofeng Liu, Fangxu Xing, Jerry L. Prince, Aaron Carass, Maureen Stone, Georges El Fakhri, Jonghye Woo

Figure 1 for Dual-cycle Constrained Bijective VAE-GAN For Tagged-to-Cine Magnetic Resonance Image Synthesis
Figure 2 for Dual-cycle Constrained Bijective VAE-GAN For Tagged-to-Cine Magnetic Resonance Image Synthesis
Figure 3 for Dual-cycle Constrained Bijective VAE-GAN For Tagged-to-Cine Magnetic Resonance Image Synthesis

Tagged magnetic resonance imaging (MRI) is a widely used imaging technique for measuring tissue deformation in moving organs. Due to tagged MRI's intrinsic low anatomical resolution, another matching set of cine MRI with higher resolution is sometimes acquired in the same scanning session to facilitate tissue segmentation, thus adding extra time and cost. To mitigate this, in this work, we propose a novel dual-cycle constrained bijective VAE-GAN approach to carry out tagged-to-cine MR image synthesis. Our method is based on a variational autoencoder backbone with cycle reconstruction constrained adversarial training to yield accurate and realistic cine MR images given tagged MR images. Our framework has been trained, validated, and tested using 1,768, 416, and 1,560 subject-independent paired slices of tagged and cine MRI from twenty healthy subjects, respectively, demonstrating superior performance over the comparison methods. Our method can potentially be used to reduce the extra acquisition time and cost, while maintaining the same workflow for further motion analyses.

* Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2021 
Viaarxiv icon

A Deep Joint Sparse Non-negative Matrix Factorization Framework for Identifying the Common and Subject-specific Functional Units of Tongue Motion During Speech

Jul 09, 2020
Jonghye Woo, Fangxu Xing, Jerry L. Prince, Maureen Stone, Arnold Gomez, Timothy G. Reese, Van J. Wedeen, Georges El Fakhri

Figure 1 for A Deep Joint Sparse Non-negative Matrix Factorization Framework for Identifying the Common and Subject-specific Functional Units of Tongue Motion During Speech
Figure 2 for A Deep Joint Sparse Non-negative Matrix Factorization Framework for Identifying the Common and Subject-specific Functional Units of Tongue Motion During Speech
Figure 3 for A Deep Joint Sparse Non-negative Matrix Factorization Framework for Identifying the Common and Subject-specific Functional Units of Tongue Motion During Speech
Figure 4 for A Deep Joint Sparse Non-negative Matrix Factorization Framework for Identifying the Common and Subject-specific Functional Units of Tongue Motion During Speech

Intelligible speech is produced by creating varying internal local muscle groupings---i.e., functional units---that are generated in a systematic and coordinated manner. There are two major challenges in characterizing and analyzing functional units. First, due to the complex and convoluted nature of tongue structure and function, it is of great importance to develop a method that can accurately decode complex muscle coordination patterns during speech. Second, it is challenging to keep identified functional units across subjects comparable due to their substantial variability. In this work, to address these challenges, we develop a new deep learning framework to identify common and subject-specific functional units of tongue motion during speech. Our framework hinges on joint deep graph-regularized sparse non-negative matrix factorization (NMF) using motion quantities derived from displacements by tagged Magnetic Resonance Imaging. More specifically, we transform NMF with sparse and manifold regularizations into modular architectures akin to deep neural networks by means of unfolding the Iterative Shrinkage-Thresholding Algorithm to learn interpretable building blocks and associated weighting map. We then apply spectral clustering to common and subject-specific functional units. Experiments carried out with simulated datasets show that the proposed method surpasses the comparison methods. Experiments carried out with in vivo tongue motion datasets show that the proposed method can determine the common and subject-specific functional units with increased interpretability and decreased size variability.

Viaarxiv icon

A Sparse Non-negative Matrix Factorization Framework for Identifying Functional Units of Tongue Behavior from MRI

Sep 29, 2018
Jonghye Woo, Jerry L. Prince, Maureen Stone, Fangxu Xing, Arnold Gomez, Jordan R. Green, Christopher J. Hartnick, Thomas J. Brady, Timothy G. Reese, Van J. Wedeen, Georges El Fakhri

Figure 1 for A Sparse Non-negative Matrix Factorization Framework for Identifying Functional Units of Tongue Behavior from MRI
Figure 2 for A Sparse Non-negative Matrix Factorization Framework for Identifying Functional Units of Tongue Behavior from MRI
Figure 3 for A Sparse Non-negative Matrix Factorization Framework for Identifying Functional Units of Tongue Behavior from MRI
Figure 4 for A Sparse Non-negative Matrix Factorization Framework for Identifying Functional Units of Tongue Behavior from MRI

Muscle coordination patterns of lingual behaviors are synergies generated by deforming local muscle groups in a variety of ways. Functional units are functional muscle groups of local structural elements within the tongue that compress, expand, and move in a cohesive and consistent manner. Identifying the functional units using tagged-Magnetic Resonance Imaging (MRI) sheds light on the mechanisms of normal and pathological muscle coordination patterns, yielding improvement in surgical planning, treatment, or rehabilitation procedures. Here, to mine this information, we propose a matrix factorization and probabilistic graphical model framework to produce building blocks and their associated weighting map using motion quantities extracted from tagged-MRI. Our tagged-MRI imaging and accurate voxel-level tracking provide previously unavailable internal tongue motion patterns, thus revealing the inner workings of the tongue during speech or other lingual behaviors. We then employ spectral clustering on the weighting map to identify the cohesive regions defined by the tongue motion that may involve multiple or undocumented regions. To evaluate our method, we perform a series of experiments. We first use two-dimensional images and synthetic data to demonstrate the accuracy of our method. We then use three-dimensional synthetic and \textit{in vivo} tongue motion data using protrusion and simple speech tasks to identify subject-specific and data-driven functional units of the tongue in localized regions.

* Accepted at IEEE TMI (https://ieeexplore.ieee.org/document/8467354) 
Viaarxiv icon

Speech Map: A Statistical Multimodal Atlas of 4D Tongue Motion During Speech from Tagged and Cine MR Images

Sep 15, 2018
Jonghye Woo, Fangxu Xing, Maureen Stone, Jordan Green, Timothy G. Reese, Thomas J. Brady, Van J. Wedeen, Jerry L. Prince, Georges El Fakhri

Figure 1 for Speech Map: A Statistical Multimodal Atlas of 4D Tongue Motion During Speech from Tagged and Cine MR Images
Figure 2 for Speech Map: A Statistical Multimodal Atlas of 4D Tongue Motion During Speech from Tagged and Cine MR Images
Figure 3 for Speech Map: A Statistical Multimodal Atlas of 4D Tongue Motion During Speech from Tagged and Cine MR Images
Figure 4 for Speech Map: A Statistical Multimodal Atlas of 4D Tongue Motion During Speech from Tagged and Cine MR Images

Quantitative measurement of functional and anatomical traits of 4D tongue motion in the course of speech or other lingual behaviors remains a major challenge in scientific research and clinical applications. Here, we introduce a statistical multimodal atlas of 4D tongue motion using healthy subjects, which enables a combined quantitative characterization of tongue motion in a reference anatomical configuration. This atlas framework, termed Speech Map, combines cine- and tagged-MRI in order to provide both the anatomic reference and motion information during speech. Our approach involves a series of steps including (1) construction of a common reference anatomical configuration from cine-MRI, (2) motion estimation from tagged-MRI, (3) transformation of the motion estimations to the reference anatomical configuration, and (4) computation of motion quantities such as Lagrangian strain. Using this framework, the anatomic configuration of the tongue appears motionless, while the motion fields and associated strain measurements change over the time course of speech. In addition, to form a succinct representation of the high-dimensional and complex motion fields, principal component analysis is carried out to characterize the central tendencies and variations of motion fields of our speech tasks. Our proposed method provides a platform to quantitatively and objectively explain the differences and variability of tongue motion by illuminating internal motion and strain that have so far been intractable. The findings are used to understand how tongue function for speech is limited by abnormal internal motion and strain in glossectomy patients.

* Accepted at Journal of Computer Methods in Biomechanics and Biomedical Engineering 
Viaarxiv icon