Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weidong Cai

The shape of the brain's connections is predictive of cognitive performance: an explainable machine learning study

Oct 19, 2024

Yui Lo, Yuqian Chen, Dongnan Liu, Wan Liu, Leo Zekelman, Jarrett Rushmore, Fan Zhang, Yogesh Rathi, Nikos Makris, Alexandra J. Golby(+2 more)

Figure 1 for The shape of the brain's connections is predictive of cognitive performance: an explainable machine learning study

Figure 2 for The shape of the brain's connections is predictive of cognitive performance: an explainable machine learning study

Figure 3 for The shape of the brain's connections is predictive of cognitive performance: an explainable machine learning study

Figure 4 for The shape of the brain's connections is predictive of cognitive performance: an explainable machine learning study

Abstract:The shape of the brain's white matter connections is relatively unexplored in diffusion MRI tractography analysis. While it is known that tract shape varies in populations and across the human lifespan, it is unknown if the variability in dMRI tractography-derived shape may relate to the brain's functional variability across individuals. This work explores the potential of leveraging tractography fiber cluster shape measures to predict subject-specific cognitive performance. We implement machine learning models to predict individual cognitive performance scores. We study a large-scale database from the HCP-YA study. We apply an atlas-based fiber cluster parcellation to the dMRI tractography of each individual. We compute 15 shape, microstructure, and connectivity features for each fiber cluster. Using these features as input, we train a total of 210 models to predict 7 different NIH Toolbox cognitive performance assessments. We apply an explainable AI technique, SHAP, to assess the importance of each fiber cluster for prediction. Our results demonstrate that shape measures are predictive of individual cognitive performance. The studied shape measures, such as irregularity, diameter, total surface area, volume, and branch volume, are as effective for prediction as microstructure and connectivity measures. The overall best-performing feature is a shape feature, irregularity, which describes how different a cluster's shape is from an idealized cylinder. Further interpretation using SHAP values suggest that fiber clusters with features highly predictive of cognitive ability are widespread throughout the brain, including fiber clusters from the superficial association, deep association, cerebellar, striatal, and projection pathways. This study demonstrates the strong potential of shape descriptors to enhance the study of the brain's white matter and its relationship to cognitive function.

Via

Access Paper or Ask Questions

Unsupervised dMRI Artifact Detection via Angular Resolution Enhancement and Cycle Consistency Learning

Sep 24, 2024

Sheng Chen, Zihao Tang, Xinyi Wang, Chenyu Wang, Weidong Cai

Figure 1 for Unsupervised dMRI Artifact Detection via Angular Resolution Enhancement and Cycle Consistency Learning

Figure 2 for Unsupervised dMRI Artifact Detection via Angular Resolution Enhancement and Cycle Consistency Learning

Figure 3 for Unsupervised dMRI Artifact Detection via Angular Resolution Enhancement and Cycle Consistency Learning

Figure 4 for Unsupervised dMRI Artifact Detection via Angular Resolution Enhancement and Cycle Consistency Learning

Abstract:Diffusion magnetic resonance imaging (dMRI) is a crucial technique in neuroimaging studies, allowing for the non-invasive probing of the underlying structures of brain tissues. Clinical dMRI data is susceptible to various artifacts during acquisition, which can lead to unreliable subsequent analyses. Therefore, dMRI preprocessing is essential for improving image quality, and manual inspection is often required to ensure that the preprocessed data is sufficiently corrected. However, manual inspection requires expertise and is time-consuming, especially with large-scale dMRI datasets. Given these challenges, an automated dMRI artifact detection tool is necessary to increase the productivity and reliability of dMRI data analysis. To this end, we propose a novel unsupervised deep learning framework called $\textbf{U}$nsupervised $\textbf{d}$MRI $\textbf{A}$rtifact $\textbf{D}$etection via $\textbf{A}$ngular Resolution Enhancement and $\textbf{C}$ycle Consistency Learning (UdAD-AC). UdAD-AC leverages dMRI angular resolution enhancement and cycle consistency learning to capture the effective representation of artifact-free dMRI data during training, and it identifies data containing artifacts using designed confidence score during inference. To assess the capability of UdAD-AC, several commonly reported dMRI artifacts, including bias field, susceptibility distortion, and corrupted volume, were added to the testing data. Experimental results demonstrate that UdAD-AC achieves the best performance compared to competitive methods in unsupervised dMRI artifact detection.

* Accepted to AJCAI2024, dMRI, Unsupervised artifact detection, Angular resolution enhancement, Cycle consistency

Via

Access Paper or Ask Questions

Enhancing Advanced Visual Reasoning Ability of Large Language Models

Sep 21, 2024

Zhiyuan Li, Dongnan Liu, Chaoyi Zhang, Heng Wang, Tengfei Xue, Weidong Cai

Figure 1 for Enhancing Advanced Visual Reasoning Ability of Large Language Models

Figure 2 for Enhancing Advanced Visual Reasoning Ability of Large Language Models

Figure 3 for Enhancing Advanced Visual Reasoning Ability of Large Language Models

Figure 4 for Enhancing Advanced Visual Reasoning Ability of Large Language Models

Abstract:Recent advancements in Vision-Language (VL) research have sparked new benchmarks for complex visual reasoning, challenging models' advanced reasoning ability. Traditional Vision-Language Models (VLMs) perform well in visual perception tasks while struggling with complex reasoning scenarios. Conversely, Large Language Models (LLMs) demonstrate robust text reasoning capabilities; however, they lack visual acuity. To bridge this gap, we propose Complex Visual Reasoning Large Language Models (CVR-LLM), capitalizing on VLMs' visual perception proficiency and LLMs' extensive reasoning capability. Unlike recent multimodal large language models (MLLMs) that require a projection layer, our approach transforms images into detailed, context-aware descriptions using an iterative self-refinement loop and leverages LLMs' text knowledge for accurate predictions without extra training. We also introduce a novel multi-modal in-context learning (ICL) methodology to enhance LLMs' contextual understanding and reasoning. Additionally, we introduce Chain-of-Comparison (CoC), a step-by-step comparison technique enabling contrasting various aspects of predictions. Our CVR-LLM presents the first comprehensive study across a wide array of complex visual reasoning tasks and achieves SOTA performance among all.

* EMNLP 2024 Main

Via

Access Paper or Ask Questions

Enhancing Angular Resolution via Directionality Encoding and Geometric Constraints in Brain Diffusion Tensor Imaging

Sep 11, 2024

Sheng Chen, Zihao Tang, Mariano Cabezas, Xinyi Wang, Arkiev D'Souza, Michael Barnett, Fernando Calamante, Weidong Cai, Chenyu Wang

Figure 1 for Enhancing Angular Resolution via Directionality Encoding and Geometric Constraints in Brain Diffusion Tensor Imaging

Figure 2 for Enhancing Angular Resolution via Directionality Encoding and Geometric Constraints in Brain Diffusion Tensor Imaging

Figure 3 for Enhancing Angular Resolution via Directionality Encoding and Geometric Constraints in Brain Diffusion Tensor Imaging

Figure 4 for Enhancing Angular Resolution via Directionality Encoding and Geometric Constraints in Brain Diffusion Tensor Imaging

Abstract:Diffusion-weighted imaging (DWI) is a type of Magnetic Resonance Imaging (MRI) technique sensitised to the diffusivity of water molecules, offering the capability to inspect tissue microstructures and is the only in-vivo method to reconstruct white matter fiber tracts non-invasively. The DWI signal can be analysed with the diffusion tensor imaging (DTI) model to estimate the directionality of water diffusion within voxels. Several scalar metrics, including axial diffusivity (AD), mean diffusivity (MD), radial diffusivity (RD), and fractional anisotropy (FA), can be further derived from DTI to quantitatively summarise the microstructural integrity of brain tissue. These scalar metrics have played an important role in understanding the organisation and health of brain tissue at a microscopic level in clinical studies. However, reliable DTI metrics rely on DWI acquisitions with high gradient directions, which often go beyond the commonly used clinical protocols. To enhance the utility of clinically acquired DWI and save scanning time for robust DTI analysis, this work proposes DirGeo-DTI, a deep learning-based method to estimate reliable DTI metrics even from a set of DWIs acquired with the minimum theoretical number (6) of gradient directions. DirGeo-DTI leverages directional encoding and geometric constraints to facilitate the training process. Two public DWI datasets were used for evaluation, demonstrating the effectiveness of the proposed method. Extensive experimental results show that the proposed method achieves the best performance compared to existing DTI enhancement methods and potentially reveals further clinical insights with routine clinical DWI scans.

* Accepted to ICONIP2024, Diffusion Weighted Imaging, Diffusion Tensor Imaging, Angular Resolution Enhancement, Fractional Anisotropy

Via

Access Paper or Ask Questions

Revisiting Surgical Instrument Segmentation Without Human Intervention: A Graph Partitioning View

Aug 27, 2024

Mingyu Sheng, Jianan Fan, Dongnan Liu, Ron Kikinis, Weidong Cai

Figure 1 for Revisiting Surgical Instrument Segmentation Without Human Intervention: A Graph Partitioning View

Figure 2 for Revisiting Surgical Instrument Segmentation Without Human Intervention: A Graph Partitioning View

Figure 3 for Revisiting Surgical Instrument Segmentation Without Human Intervention: A Graph Partitioning View

Figure 4 for Revisiting Surgical Instrument Segmentation Without Human Intervention: A Graph Partitioning View

Abstract:Surgical instrument segmentation (SIS) on endoscopic images stands as a long-standing and essential task in the context of computer-assisted interventions for boosting minimally invasive surgery. Given the recent surge of deep learning methodologies and their data-hungry nature, training a neural predictive model based on massive expert-curated annotations has been dominating and served as an off-the-shelf approach in the field, which could, however, impose prohibitive burden to clinicians for preparing fine-grained pixel-wise labels corresponding to the collected surgical video frames. In this work, we propose an unsupervised method by reframing the video frame segmentation as a graph partitioning problem and regarding image pixels as graph nodes, which is significantly different from the previous efforts. A self-supervised pre-trained model is firstly leveraged as a feature extractor to capture high-level semantic features. Then, Laplacian matrixs are computed from the features and are eigendecomposed for graph partitioning. On the "deep" eigenvectors, a surgical video frame is meaningfully segmented into different modules such as tools and tissues, providing distinguishable semantic information like locations, classes, and relations. The segmentation problem can then be naturally tackled by applying clustering or threshold on the eigenvectors. Extensive experiments are conducted on various datasets (e.g., EndoVis2017, EndoVis2018, UCL, etc.) for different clinical endpoints. Across all the challenging scenarios, our method demonstrates outstanding performance and robustness higher than unsupervised state-of-the-art (SOTA) methods. The code is released at https://github.com/MingyuShengSMY/GraphClusteringSIS.git.

Via

Access Paper or Ask Questions

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Aug 18, 2024

Kaicheng Yang, Tiancheng Gu, Xiang An, Haiqiang Jiang, Xiangzi Dai, Ziyong Feng, Weidong Cai, Jiankang Deng

Figure 1 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Figure 2 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Figure 3 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Figure 4 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Abstract:Contrastive Language-Image Pre-training (CLIP) has achieved excellent performance over a wide range of tasks. However, the effectiveness of CLIP heavily relies on a substantial corpus of pre-training data, resulting in notable consumption of computational resources. Although knowledge distillation has been widely applied in single modality models, how to efficiently expand knowledge distillation to vision-language foundation models with extensive data remains relatively unexplored. In this paper, we introduce CLIP-CID, a novel distillation mechanism that effectively transfers knowledge from a large vision-language foundation model to a smaller model. We initially propose a simple but efficient image semantic balance method to reduce transfer learning bias and improve distillation efficiency. This method filters out 43.7% of image-text pairs from the LAION400M while maintaining superior performance. After that, we leverage cluster-instance discrimination to facilitate knowledge transfer from the teacher model to the student model, thereby empowering the student model to acquire a holistic semantic comprehension of the pre-training data. Experimental results demonstrate that CLIP-CID achieves state-of-the-art performance on various downstream tasks including linear probe and zero-shot classification.

* 11 pages,8 figures

Via

Access Paper or Ask Questions

Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

Aug 15, 2024

Zhiyuan Li, Heng Wang, Dongnan Liu, Chaoyi Zhang, Ao Ma, Jieting Long, Weidong Cai

Figure 1 for Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

Figure 2 for Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

Figure 3 for Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

Figure 4 for Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

Abstract:Large Language Models (LLMs) have showcased exceptional ability in causal reasoning from textual information. However, will these causalities remain straightforward for Vision Large Language Models (VLLMs) when only visual hints are provided? Motivated by this, we propose a novel Multimodal Causal Reasoning benchmark, namely MuCR, to challenge VLLMs to infer semantic cause-and-effect relationship when solely relying on visual cues such as action, appearance, clothing, and environment. Specifically, we introduce a prompt-driven image synthesis approach to create siamese images with embedded semantic causality and visual cues, which can effectively evaluate VLLMs' causal reasoning capabilities. Additionally, we develop tailored metrics from multiple perspectives, including image-level match, phrase-level understanding, and sentence-level explanation, to comprehensively assess VLLMs' comprehension abilities. Our extensive experiments reveal that the current state-of-the-art VLLMs are not as skilled at multimodal causal reasoning as we might have hoped. Furthermore, we perform a comprehensive analysis to understand these models' shortcomings from different views and suggest directions for future research. We hope MuCR can serve as a valuable resource and foundational benchmark in multimodal causal reasoning research. The project is available at: https://github.com/Zhiyuan-Li-John/MuCR

* 20 pages

Via

Access Paper or Ask Questions

White Matter Geometry-Guided Score-Based Diffusion Model for Tissue Microstructure Imputation in Tractography Imaging

Jul 28, 2024

Yui Lo, Yuqian Chen, Fan Zhang, Dongnan Liu, Leo Zekelman, Suheyla Cetin-Karayumak, Yogesh Rathi, Weidong Cai, Lauren J. O'Donnell

Figure 1 for White Matter Geometry-Guided Score-Based Diffusion Model for Tissue Microstructure Imputation in Tractography Imaging

Figure 2 for White Matter Geometry-Guided Score-Based Diffusion Model for Tissue Microstructure Imputation in Tractography Imaging

Figure 3 for White Matter Geometry-Guided Score-Based Diffusion Model for Tissue Microstructure Imputation in Tractography Imaging

Figure 4 for White Matter Geometry-Guided Score-Based Diffusion Model for Tissue Microstructure Imputation in Tractography Imaging

Abstract:Parcellation of white matter tractography provides anatomical features for disease prediction, anatomical tract segmentation, surgical brain mapping, and non-imaging phenotype classifications. However, parcellation does not always reach 100% accuracy due to various factors, including inter-individual anatomical variability and the quality of neuroimaging scan data. The failure to identify parcels causes a problem of missing microstructure data values, which is especially challenging for downstream tasks that analyze large brain datasets. In this work, we propose a novel deep-learning model to impute tissue microstructure: the White Matter Geometry-guided Diffusion (WMG-Diff) model. Specifically, we first propose a deep score-based guided diffusion model to impute tissue microstructure for diffusion magnetic resonance imaging (dMRI) tractography fiber clusters. Second, we propose a white matter atlas geometric relationship-guided denoising function to guide the reverse denoising process at the subject-specific level. Third, we train and evaluate our model on a large dataset with 9342 subjects. Comprehensive experiments for tissue microstructure imputation and a downstream non-imaging phenotype prediction task demonstrate that our proposed WMG-Diff outperforms state-of-the-art methods.

* 12 pages, 3 figures, 2 tables

Via

Access Paper or Ask Questions

Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View

Jul 19, 2024

Jianan Fan, Dongnan Liu, Canran Li, Hang Chang, Heng Huang, Filip Braet, Mei Chen, Weidong Cai

Figure 1 for Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View

Figure 2 for Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View

Figure 3 for Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View

Figure 4 for Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View

Abstract:Cellular nuclei recognition serves as a fundamental and essential step in the workflow of digital pathology. However, with disparate source organs and staining procedures among histology image clusters, the scanned tiles inherently conform to a non-uniform data distribution, which induces deteriorated promises for general cross-cohort usages. Despite the latest efforts leveraging domain adaptation to mitigate distributional discrepancy, those methods are subjected to modeling the morphological characteristics of each cell individually, disregarding the hierarchical latent structure and intrinsic contextual correspondences across the tumor micro-environment. In this work, we identify the importance of implicit correspondences across biological contexts for exploiting domain-invariant pathological composition and thereby propose to exploit the dependence over various biological structures for domain adaptive cellular recognition. We discover those high-level correspondences via unsupervised contextual modeling and use them as bridges to facilitate adaptation over diverse organs and stains. In addition, to further exploit the rich spatial contexts embedded amongst nuclear communities, we propose self-adaptive dynamic distillation to secure instance-aware trade-offs across different model constituents. The proposed method is extensively evaluated on a broad spectrum of cross-domain settings under miscellaneous data distribution shifts and outperforms the state-of-the-art methods by a substantial margin. Code is available at https://github.com/camwew/CellularRecognition_DA_CC.

* ECCV 2024 main conference

Via

Access Paper or Ask Questions

Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights

Jul 16, 2024

Shunqi Mao, Chaoyi Zhang, Hang Su, Hwanjun Song, Igor Shalyminov, Weidong Cai

Abstract:Contextualized Image Captioning (CIC) evolves traditional image captioning into a more complex domain, necessitating the ability for multimodal reasoning. It aims to generate image captions given specific contextual information. This paper further introduces a novel domain of Controllable Contextualized Image Captioning (Ctrl-CIC). Unlike CIC, which solely relies on broad context, Ctrl-CIC accentuates a user-defined highlight, compelling the model to tailor captions that resonate with the highlighted aspects of the context. We present two approaches, Prompting-based Controller (P-Ctrl) and Recalibration-based Controller (R-Ctrl), to generate focused captions. P-Ctrl conditions the model generation on highlight by prepending captions with highlight-driven prefixes, whereas R-Ctrl tunes the model to selectively recalibrate the encoder embeddings for highlighted tokens. Additionally, we design a GPT-4V empowered evaluator to assess the quality of the controlled captions alongside standard assessment methods. Extensive experimental results demonstrate the efficient and effective controllability of our method, charting a new direction in achieving user-adaptive image captioning. Code is available at https://github.com/ShunqiM/Ctrl-CIC .

* ECCV 2024

Via

Access Paper or Ask Questions