Effective connectivity estimation plays a crucial role in understanding the interactions and information flow between different brain regions. However, the functional time series used for estimating effective connentivity is derived from certain software, which may lead to large computing errors because of different parameter settings and degrade the ability to model complex causal relationships between brain regions. In this paper, a brain diffuser with hierarchical transformer (BDHT) is proposed to estimate effective connectivity for mild cognitive impairment (MCI) analysis. To our best knowledge, the proposed brain diffuer is the first generative model to apply diffusion models in the application of generating and analyzing multimodal brain networks. Specifically, the BDHT leverages the structural connectivity to guide the reverse processes in an efficient way. It makes the denoising process more reliable and guarantees effective connectivity estimation accuracy. To improve denoising quality, the hierarchical denoising transformer is designed to learn multi-scale features in topological space. Furthermore, the GraphConFormer block can concentrate on both global and adjacent connectivity information. By stacking the multi-head attention and graph convolutional network, the proposed model enhances structure-function complementarity and improves the ability in noise estimation. Experimental evaluations of the denoising diffusion model demonstrate its effectiveness in estimating effective connectivity. The method achieves superior performance in terms of accuracy and robustness compared to existing approaches. It can captures both unidirectal and bidirectional interactions between brain regions, providing a comprehensive understanding of the brain's information processing mechanisms.
Brain network analysis has emerged as pivotal method for gaining a deeper understanding of brain functions and disease mechanisms. Despite the existence of various network construction approaches, shortcomings persist in the learning of correlations between structural and functional brain imaging data. In light of this, we introduce a novel method called BrainNetDiff, which combines a multi-head Transformer encoder to extract relevant features from fMRI time series and integrates a conditional latent diffusion model for brain network generation. Leveraging a conditional prompt and a fusion attention mechanism, this method significantly improves the accuracy and stability of brain network generation. To the best of our knowledge, this represents the first framework that employs diffusion for the fusion of the multimodal brain imaging and brain network generation from images to graphs. We validate applicability of this framework in the construction of brain network across healthy and neurologically impaired cohorts using the authentic dataset. Experimental results vividly demonstrate the significant effectiveness of the proposed method across the downstream disease classification tasks. These findings convincingly emphasize the prospective value in the field of brain network research, particularly its key significance in neuroimaging analysis and disease diagnosis. This research provides a valuable reference for the processing of multimodal brain imaging data and introduces a novel, efficient solution to the field of neuroimaging.
Underwater images often exhibit poor quality, imbalanced coloration, and low contrast due to the complex and intricate interaction of light, water, and objects. Despite the significant contributions of previous underwater enhancement techniques, there exist several problems that demand further improvement: (i) Current deep learning methodologies depend on Convolutional Neural Networks (CNNs) that lack multi-scale enhancement and also have limited global perception fields. (ii) The scarcity of paired real-world underwater datasets poses a considerable challenge, and the utilization of synthetic image pairs risks overfitting. To address the aforementioned issues, this paper presents a Multi-scale Transformer-based Network called UWFormer for enhancing images at multiple frequencies via semi-supervised learning, in which we propose a Nonlinear Frequency-aware Attention mechanism and a Multi-Scale Fusion Feed-forward Network for low-frequency enhancement. Additionally, we introduce a specialized underwater semi-supervised training strategy, proposing a Subaqueous Perceptual Loss function to generate reliable pseudo labels. Experiments using full-reference and non-reference underwater benchmarks demonstrate that our method outperforms state-of-the-art methods in terms of both quantity and visual quality.
Fusing structural-functional images of the brain has shown great potential to analyze the deterioration of Alzheimer's disease (AD). However, it is a big challenge to effectively fuse the correlated and complementary information from multimodal neuroimages. In this paper, a novel model termed cross-modal transformer generative adversarial network (CT-GAN) is proposed to effectively fuse the functional and structural information contained in functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI). The CT-GAN can learn topological features and generate multimodal connectivity from multimodal imaging data in an efficient end-to-end manner. Moreover, the swapping bi-attention mechanism is designed to gradually align common features and effectively enhance the complementary features between modalities. By analyzing the generated connectivity features, the proposed model can identify AD-related brain connections. Evaluations on the public ADNI dataset show that the proposed CT-GAN can dramatically improve prediction performance and detect AD-related brain regions effectively. The proposed model also provides new insights for detecting AD-related abnormal neural circuits.
Cross-modal medical image translation is an essential task for synthesizing missing modality data for clinical diagnosis. However, current learning-based techniques have limitations in capturing cross-modal and global features, restricting their suitability to specific pairs of modalities. This lack of versatility undermines their practical usefulness, particularly considering that the missing modality may vary for different cases. In this study, we present MedPrompt, a multi-task framework that efficiently translates different modalities. Specifically, we propose the Self-adaptive Prompt Block, which dynamically guides the translation network towards distinct modalities. Within this framework, we introduce the Prompt Extraction Block and the Prompt Fusion Block to efficiently encode the cross-modal prompt. To enhance the extraction of global features across diverse modalities, we incorporate the Transformer model. Extensive experimental results involving five datasets and four pairs of modalities demonstrate that our proposed model achieves state-of-the-art visual quality and exhibits excellent generalization capability.
Fusing structural-functional images of the brain has shown great potential to analyze the deterioration of Alzheimer's disease (AD). However, it is a big challenge to effectively fuse the correlated and complementary information from multimodal neuroimages. In this paper, a novel model termed cross-modal transformer generative adversarial network (CT-GAN) is proposed to effectively fuse the functional and structural information contained in functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI). The CT-GAN can learn topological features and generate multimodal connectivity from multimodal imaging data in an efficient end-to-end manner. Moreover, the swapping bi-attention mechanism is designed to gradually align common features and effectively enhance the complementary features between modalities. By analyzing the generated connectivity features, the proposed model can identify AD-related brain connections. Evaluations on the public ADNI dataset show that the proposed CT-GAN can dramatically improve prediction performance and detect AD-related brain regions effectively. The proposed model also provides new insights for detecting AD-related abnormal neural circuits.
Mapping from functional connectivity (FC) to structural connectivity (SC) can facilitate multimodal brain network fusion and discover potential biomarkers for clinical implications. However, it is challenging to directly bridge the reliable non-linear mapping relations between SC and functional magnetic resonance imaging (fMRI). In this paper, a novel diffusision generative adversarial network-based fMRI-to-SC (DiffGAN-F2S) model is proposed to predict SC from brain fMRI in an end-to-end manner. To be specific, the proposed DiffGAN-F2S leverages denoising diffusion probabilistic models (DDPMs) and adversarial learning to efficiently generate high-fidelity SC through a few steps from fMRI. By designing the dual-channel multi-head spatial attention (DMSA) and graph convolutional modules, the symmetric graph generator first captures global relations among direct and indirect connected brain regions, then models the local brain region interactions. It can uncover the complex mapping relations between fMRI and structural connectivity. Furthermore, the spatially connected consistency loss is devised to constrain the generator to preserve global-local topological information for accurate intrinsic SC prediction. Testing on the public Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the proposed model can effectively generate empirical SC-preserved connectivity from four-dimensional imaging data and shows superior performance in SC prediction compared with other related models. Furthermore, the proposed model can identify the vast majority of important brain regions and connections derived from the empirical method, providing an alternative way to fuse multimodal brain networks and analyze clinical disease.
The relationship between brain structure and function is critical for revealing the pathogenesis of brain disease, including Alzheimer's disease (AD). However, it is a great challenge to map brain structure-function connections due to various reasons. In this work, a bidirectional graph generative adversarial networks (BGGAN) is proposed to represent brain structure-function connections. Specifically, by designing a module incorporating inner graph convolution network (InnerGCN), the generators of BGGAN can employ features of direct and indirect brain regions to learn the mapping function between structural domain and functional domain. Besides, a new module named Balancer is designed to counterpoise the optimization between generators and discriminators. By introducing the Balancer into BGGAN, both the structural generator and functional generator can not only alleviate the issue of mode collapse but also learn complementarity of structural and functional features. Experimental results using ADNI datasets show that the both the generated structure connections and generated function connections can improve the identification accuracy of AD. More importantly, based the proposed model, it is found that the relationship between brain structure and function is not a complete one-to-one correspondence. Brain structure is the basis of brain function. The strong structural connections are almost accompanied by strong functional connections.
Document shadow is a common issue that arise when capturing documents using mobile devices, which significantly impacts the readability. Current methods encounter various challenges including inaccurate detection of shadow masks and estimation of illumination. In this paper, we propose ShaDocFormer, a Transformer-based architecture that integrates traditional methodologies and deep learning techniques to tackle the problem of document shadow removal. The ShaDocFormer architecture comprises two components: the Shadow-attentive Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module employs a traditional thresholding technique and leverages the attention mechanism of the Transformer to gather global information, thereby enabling precise detection of shadow masks. The cascaded and aggregative structure of the CFR module facilitates a coarse-to-fine restoration process for the entire image. As a result, ShaDocFormer excels in accurately detecting and capturing variations in both shadow and illumination, thereby enabling effective removal of shadows. Extensive experiments demonstrate that ShaDocFormer outperforms current state-of-the-art methods in both qualitative and quantitative measurements.
Document shadow is a common issue that arise when capturing documents using mobile devices, which significantly impacts the readability. Current methods encounter various challenges including inaccurate detection of shadow masks and estimation of illumination. In this paper, we propose ShaDocFormer, a Transformer-based architecture that integrates traditional methodologies and deep learning techniques to tackle the problem of document shadow removal. The ShaDocFormer architecture comprises two components: the Shadow-attentive Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module employs a traditional thresholding technique and leverages the attention mechanism of the Transformer to gather global information, thereby enabling precise detection of shadow masks. The cascaded and aggregative structure of the CFR module facilitates a coarse-to-fine restoration process for the entire image. As a result, ShaDocFormer excels in accurately detecting and capturing variations in both shadow and illumination, thereby enabling effective removal of shadows. Extensive experiments demonstrate that ShaDocFormer outperforms current state-of-the-art methods in both qualitative and quantitative measurements.