Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shanshan Wang

Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, Peng Cheng Laboratory, Shenzhen, China

Enhancing Out-of-Distribution Detection with Multitesting-based Layer-wise Feature Fusion

Mar 16, 2024

Jiawei Li, Sitong Li, Shanshan Wang, Yicheng Zeng, Falong Tan, Chuanlong Xie

Abstract:Deploying machine learning in open environments presents the challenge of encountering diverse test inputs that differ significantly from the training data. These out-of-distribution samples may exhibit shifts in local or global features compared to the training distribution. The machine learning (ML) community has responded with a number of methods aimed at distinguishing anomalous inputs from original training data. However, the majority of previous studies have primarily focused on the output layer or penultimate layer of pre-trained deep neural networks. In this paper, we propose a novel framework, Multitesting-based Layer-wise Out-of-Distribution (OOD) Detection (MLOD), to identify distributional shifts in test samples at different levels of features through rigorous multiple testing procedure. Our approach distinguishes itself from existing methods as it does not require modifying the structure or fine-tuning of the pre-trained classifier. Through extensive experiments, we demonstrate that our proposed framework can seamlessly integrate with any existing distance-based inspection method while efficiently utilizing feature extractors of varying depths. Our scheme effectively enhances the performance of out-of-distribution detection when compared to baseline methods. In particular, MLOD-Fisher achieves superior performance in general. When trained using KNN on CIFAR10, MLOD-Fisher significantly lowers the false positive rate (FPR) from 24.09% to 7.47% on average compared to merely utilizing the features of the last layer.

Via

Access Paper or Ask Questions

Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

Feb 05, 2024

Jiarun Liu, Hao Yang, Hong-Yu Zhou, Yan Xi, Lequan Yu, Yizhou Yu, Yong Liang, Guangming Shi, Shaoting Zhang, Hairong Zheng(+1 more)

Figure 1 for Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

Figure 2 for Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

Figure 3 for Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

Figure 4 for Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

Abstract:Accurate medical image segmentation demands the integration of multi-scale information, spanning from local features to global dependencies. However, it is challenging for existing methods to model long-range global information, where convolutional neural networks (CNNs) are constrained by their local receptive fields, and vision transformers (ViTs) suffer from high quadratic complexity of their attention mechanism. Recently, Mamba-based models have gained great attention for their impressive ability in long sequence modeling. Several studies have demonstrated that these models can outperform popular vision models in various tasks, offering higher accuracy, lower memory consumption, and less computational burden. However, existing Mamba-based models are mostly trained from scratch and do not explore the power of pretraining, which has been proven to be quite effective for data-efficient medical image analysis. This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks, leveraging the advantages of ImageNet-based pretraining. Our experimental results reveal the vital role of ImageNet-based training in enhancing the performance of Mamba-based models. Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models. Notably, on AbdomenMRI, Encoscopy, and Microscopy datasets, Swin-UMamba outperforms its closest counterpart U-Mamba by an average score of 3.58%. The code and models of Swin-UMamba are publicly available at: https://github.com/JiarunLiu/Swin-UMamba

* Technical report

Via

Access Paper or Ask Questions

Knowledge-driven deep learning for fast MR imaging: undersampled MR image reconstruction from supervised to un-supervised learning

Feb 05, 2024

Shanshan Wang, Ruoyou Wu, Sen Jia, Alou Diakite, Cheng Li, Qiegen Liu, Leslie Ying

Abstract:Deep learning (DL) has emerged as a leading approach in accelerating MR imaging. It employs deep neural networks to extract knowledge from available datasets and then applies the trained networks to reconstruct accurate images from limited measurements. Unlike natural image restoration problems, MR imaging involves physics-based imaging processes, unique data properties, and diverse imaging tasks. This domain knowledge needs to be integrated with data-driven approaches. Our review will introduce the significant challenges faced by such knowledge-driven DL approaches in the context of fast MR imaging along with several notable solutions, which include learning neural networks and addressing different imaging application scenarios. The traits and trends of these techniques have also been given which have shifted from supervised learning to semi-supervised learning, and finally, to unsupervised learning methods. In addition, MR vendors' choices of DL reconstruction have been provided along with some discussions on open questions and future directions, which are critical for the reliable imaging systems.

* 46 pages, 5figures, 1 table

Via

Access Paper or Ask Questions

Positive and negative sampling strategies for self-supervised learning on audio-video data

Feb 05, 2024

Shanshan Wang, Soumya Tripathy, Toni Heittola, Annamaria Mesaros

Figure 1 for Positive and negative sampling strategies for self-supervised learning on audio-video data

Figure 2 for Positive and negative sampling strategies for self-supervised learning on audio-video data

Figure 3 for Positive and negative sampling strategies for self-supervised learning on audio-video data

Abstract:In Self-Supervised Learning (SSL), Audio-Visual Correspondence (AVC) is a popular task to learn deep audio and video features from large unlabeled datasets. The key step in AVC is to randomly sample audio and video clips from the dataset and learn to minimize the feature distance between the positive pairs (corresponding audio-video pair) while maximizing the distance between the negative pairs (non-corresponding audio-video pairs). The learnt features are shown to be effective on various downstream tasks. However, these methods achieve subpar performance when the size of the dataset is rather small. In this paper, we investigate the effect of utilizing class label information in the AVC feature learning task. We modified various positive and negative data sampling techniques of SSL based on class label information to investigate the effect on the feature quality. We propose a new sampling approach which we call soft-positive sampling, where the positive pair for one audio sample is not from the exact corresponding video, but from a video of the same class. Experimental results suggest that when the dataset size is small in SSL setup, features learnt through the soft-positive sampling method significantly outperform those from the traditional SSL sampling approaches. This trend holds in both in-domain and out-of-domain downstream tasks, and even outperforms supervised classification. Finally, experiments show that class label information can easily be obtained using a publicly available classifier network and then can be used to boost the SSL performance without adding extra data annotation burden.

Via

Access Paper or Ask Questions

Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement

Jan 21, 2024

Cheng Li, Weijian Huang, Hao Yang, Jiarun Liu, Shanshan Wang

Figure 1 for Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement

Figure 2 for Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement

Figure 3 for Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement

Figure 4 for Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement

Abstract:Recently, vision-language representation learning has made remarkable advancements in building up medical foundation models, holding immense potential for transforming the landscape of clinical research and medical care. The underlying hypothesis is that the rich knowledge embedded in radiology reports can effectively assist and guide the learning process, reducing the need for additional labels. However, these reports tend to be complex and sometimes even consist of redundant descriptions that make the representation learning too challenging to capture the key semantic information. This paper develops a novel iterative vision-language representation learning framework by proposing a key semantic knowledge-emphasized report refinement method. Particularly, raw radiology reports are refined to highlight the key information according to a constructed clinical dictionary and two model-optimized knowledge-enhancement metrics. The iterative framework is designed to progressively learn, starting from gaining a general understanding of the patient's condition based on raw reports and gradually refines and extracts critical information essential to the fine-grained analysis tasks. The effectiveness of the proposed framework is validated on various downstream medical image analysis tasks, including disease classification, region-of-interest segmentation, and phrase grounding. Our framework surpasses seven state-of-the-art methods in both fine-tuning and zero-shot settings, demonstrating its encouraging potential for different clinical applications.

Via

Access Paper or Ask Questions

Generalizable vision-language pre-training for annotation-free pathology localization

Jan 04, 2024

Hao Yang, Hong-Yu Zhou, Cheng Li, Weijian Huang, Jiarun Liu, Shanshan Wang

Abstract:Locating pathologies automatically from medical images aids the understanding of the emergence and progression of diseases, and such an ability can significantly benefit clinical diagnostics. However, existing deep learning models heavily rely on expert annotations and lack generalization capabilities in open clinical environments. In this study, we present a generalizable vision-language pre-training model for Annotation-Free pathology Localization (AFLoc). The core strength of AFLoc lies in its image annotation-free multi-level semantic structure-based contrastive learning, which comprehensively aligns multi-granularity medical concepts from reports with abundant image features, to adapt to the diverse expressions of observed and emerging unseen pathologies. We conducted extensive experimental validation across 4 distinct external datasets, encompassing 11 types of chest pathologies, to verify its generalization ability. The results demonstrate that AFLoc surpasses 6 state-of-the-art methods and even outperforms the human benchmark in locating 5 different pathologies, underscoring its suitability for complex clinical environments.

Via

Access Paper or Ask Questions

AID-DTI: Accelerating High-fidelity Diffusion Tensor Imaging with Detail-Preserving Model-based Deep Learning

Jan 03, 2024

Wenxin Fan, Jian Cheng, Cheng Li, Xinrui Ma, Jing Yang, Juan Zou, Ruoyou Wu, Qiegen Liu, Shanshan Wang

Abstract:Deep learning has shown great potential in accelerating diffusion tensor imaging (DTI). Nevertheless, existing methods tend to suffer from Rician noise and detail loss in reconstructing the DTI-derived parametric maps especially when sparsely sampled q-space data are used. This paper proposes a novel method, AID-DTI (Accelerating hIgh fiDelity Diffusion Tensor Imaging), to facilitate fast and accurate DTI with only six measurements. AID-DTI is equipped with a newly designed Singular Value Decomposition (SVD)-based regularizer, which can effectively capture fine details while suppressing noise during network training. Experimental results on Human Connectome Project (HCP) data consistently demonstrate that the proposed method estimates DTI parameter maps with fine-grained details and outperforms three state-of-the-art methods both quantitatively and qualitatively.

Via

Access Paper or Ask Questions

Enhancing the medical foundation model with multi-scale and cross-modality feature learning

Jan 03, 2024

Weijian Huang, Cheng Li, Hong-Yu Zhou, Jiarun Liu, Hao Yang, Yong Liang, Shanshan Wang

Figure 1 for Enhancing the medical foundation model with multi-scale and cross-modality feature learning

Figure 2 for Enhancing the medical foundation model with multi-scale and cross-modality feature learning

Figure 3 for Enhancing the medical foundation model with multi-scale and cross-modality feature learning

Figure 4 for Enhancing the medical foundation model with multi-scale and cross-modality feature learning

Abstract:The development of multi-modal medical foundation models has attracted significant attention in the field of medicine and healthcare due to their promising prospects in various clinical applications. One area of focus in this research direction is the extractions of features at different scales. While previous studies have explored feature learning at individual scales, investigation on integrating the diverse scales and modalities of information is lacking, which may hinder the potential for mutual reinforcement among these features. This paper aims to bridge this gap by proposing a method that effectively exploits multi-scale and cross-modality information to enhance the performance of medical foundation models. The proposed method simultaneously exploit features at the local, instance, modality and global aspects, facilitating comprehensive representation learning within the models. We evaluate the effectiveness of the proposed method on six open-source datasets across different clinical tasks, demonstrating its ability to enhance the performance of medical foundation models.

Via

Access Paper or Ask Questions

Simultaneous q-Space Sampling Optimization and Reconstruction for Fast and High-fidelity Diffusion Magnetic Resonance Imaging

Jan 03, 2024

Jing Yang, Jian Cheng, Cheng Li, Wenxin Fan, Juan Zou, Ruoyou Wu, Shanshan Wang

Abstract:Diffusion Magnetic Resonance Imaging (dMRI) plays a crucial role in the noninvasive investigation of tissue microstructural properties and structural connectivity in the \textit{in vivo} human brain. However, to effectively capture the intricate characteristics of water diffusion at various directions and scales, it is important to employ comprehensive q-space sampling. Unfortunately, this requirement leads to long scan times, limiting the clinical applicability of dMRI. To address this challenge, we propose SSOR, a Simultaneous q-Space sampling Optimization and Reconstruction framework. We jointly optimize a subset of q-space samples using a continuous representation of spherical harmonic functions and a reconstruction network. Additionally, we integrate the unique properties of diffusion magnetic resonance imaging (dMRI) in both the q-space and image domains by applying $l1$-norm and total-variation regularization. The experiments conducted on HCP data demonstrate that SSOR has promising strengths both quantitatively and qualitatively and exhibits robustness to noise.

Via

Access Paper or Ask Questions

LESEN: Label-Efficient deep learning for Multi-parametric MRI-based Visual Pathway Segmentation

Jan 03, 2024

Alou Diakite, Cheng Li, Lei Xie, Yuanjing Feng, Hua Han, Shanshan Wang

Abstract:Recent research has shown the potential of deep learning in multi-parametric MRI-based visual pathway (VP) segmentation. However, obtaining labeled data for training is laborious and time-consuming. Therefore, it is crucial to develop effective algorithms in situations with limited labeled samples. In this work, we propose a label-efficient deep learning method with self-ensembling (LESEN). LESEN incorporates supervised and unsupervised losses, enabling the student and teacher models to mutually learn from each other, forming a self-ensembling mean teacher framework. Additionally, we introduce a reliable unlabeled sample selection (RUSS) mechanism to further enhance LESEN's effectiveness. Our experiments on the human connectome project (HCP) dataset demonstrate the superior performance of our method when compared to state-of-the-art techniques, advancing multimodal VP segmentation for comprehensive analysis in clinical and research settings. The implementation code will be available at: https://github.com/aldiak/Semi-Supervised-Multimodal-Visual-Pathway- Delineation.

Via

Access Paper or Ask Questions