Pseudo-normality synthesis, which computationally generates a pseudo-normal image from an abnormal one (e.g., with lesions), is critical in many perspectives, from lesion detection, data augmentation to clinical surgery suggestion. However, it is challenging to generate high-quality pseudo-normal images in the absence of the lesion information. Thus, expensive lesion segmentation data have been introduced to provide lesion information for the generative models and improve the quality of the synthetic images. In this paper, we aim to alleviate the need of a large amount of lesion segmentation data when generating pseudo-normal images. We propose a Semi-supervised Medical Image generative LEarning network (SMILE) which not only utilizes limited medical images with segmentation masks, but also leverages massive medical images without segmentation masks to generate realistic pseudo-normal images. Extensive experiments show that our model outperforms the best state-of-the-art model by up to 6% for data augmentation task and 3% in generating high-quality images. Moreover, the proposed semi-supervised learning achieves comparable medical image synthesis quality with supervised learning model, using only 50 of segmentation data.
Spine-related diseases have high morbidity and cause a huge burden of social cost. Spine imaging is an essential tool for noninvasively visualizing and assessing spinal pathology. Segmenting vertebrae in computed tomography (CT) images is the basis of quantitative medical image analysis for clinical diagnosis and surgery planning of spine diseases. Current publicly available annotated datasets on spinal vertebrae are small in size. Due to the lack of a large-scale annotated spine image dataset, the mainstream deep learning-based segmentation methods, which are data-driven, are heavily restricted. In this paper, we introduce a large-scale spine CT dataset, called CTSpine1K, curated from multiple sources for vertebra segmentation, which contains 1,005 CT volumes with over 11,100 labeled vertebrae belonging to different spinal conditions. Based on this dataset, we conduct several spinal vertebrae segmentation experiments to set the first benchmark. We believe that this large-scale dataset will facilitate further research in many spine-related image analysis tasks, including but not limited to vertebrae segmentation, labeling, 3D spine reconstruction from biplanar radiographs, image super-resolution, and enhancement.
Multimodal image registration has many applications in diagnostic medical imaging and image-guided interventions, such as Transcatheter Arterial Chemoembolization (TACE) of liver cancer guided by intraprocedural CBCT and pre-operative MR. The ability to register peri-procedurally acquired diagnostic images into the intraprocedural environment can potentially improve the intra-procedural tumor targeting, which will significantly improve therapeutic outcomes. However, the intra-procedural CBCT often suffers from suboptimal image quality due to lack of signal calibration for Hounsfield unit, limited FOV, and motion/metal artifacts. These non-ideal conditions make standard intensity-based multimodal registration methods infeasible to generate correct transformation across modalities. While registration based on anatomic structures, such as segmentation or landmarks, provides an efficient alternative, such anatomic structure information is not always available. One can train a deep learning-based anatomy extractor, but it requires large-scale manual annotations on specific modalities, which are often extremely time-consuming to obtain and require expert radiological readers. To tackle these issues, we leverage annotated datasets already existing in a source modality and propose an anatomy-preserving domain adaptation to segmentation network (APA2Seg-Net) for learning segmentation without target modality ground truth. The segmenters are then integrated into our anatomy-guided multimodal registration based on the robust point matching machine. Our experimental results on in-house TACE patient data demonstrated that our APA2Seg-Net can generate robust CBCT and MR liver segmentation, and the anatomy-guided registration framework with these segmenters can provide high-quality multimodal registrations. Our code is available at https://github.com/bbbbbbzhou/APA2Seg-Net.
Universal Lesion Detection (ULD) in computed tomography plays an essential role in computer-aided diagnosis. Promising ULD results have been reported by coarse-to-fine two-stage detection approaches, but such two-stage ULD methods still suffer from issues like imbalance of positive v.s. negative anchors during object proposal and insufficient supervision problem during localization regression and classification of the region of interest (RoI) proposals. While leveraging pseudo segmentation masks such as bounding map (BM) can reduce the above issues to some degree, it is still an open problem to effectively handle the diverse lesion shapes and sizes in ULD. In this paper, we propose a BM-based conditional training for two-stage ULD, which can (i) reduce positive vs. negative anchor imbalance via BM-based conditioning (BMC) mechanism for anchor sampling instead of traditional IoU-based rule; and (ii) adaptively compute size-adaptive BM (ABM) from lesion bounding box, which is used for improving lesion localization accuracy via ABMsupervised segmentation. Experiments with four state-of-the-art methods show that the proposed approach can bring an almost free detection accuracy improvement without requiring expensive lesion mask annotations.
Segmentation of pathological images is essential for accurate disease diagnosis. The quality of manual labels plays a critical role in segmentation accuracy; yet, in practice, the labels between pathologists could be inconsistent, thus confusing the training process. In this work, we propose a novel label re-weighting framework to account for the reliability of different experts' labels on each pixel according to its surrounding features. We further devise a new attention heatmap, which takes roughness as prior knowledge to guide the model to focus on important regions. Our approach is evaluated on the public Gleason 2019 datasets. The results show that our approach effectively improves the model's robustness against noisy labels and outperforms state-of-the-art approaches.
Deep neural networks have been extensively studied for undersampled MRI reconstruction. While achieving state-of-the-art performance, they are trained and deployed specifically for one anatomy with limited generalization ability to another anatomy. Rather than building multiple models, a universal model that reconstructs images across different anatomies is highly desirable for efficient deployment and better generalization. Simply mixing images from multiple anatomies for training a single network does not lead to an ideal universal model due to the statistical shift among datasets of various anatomies, the need to retrain from scratch on all datasets with the addition of a new dataset, and the difficulty in dealing with imbalanced sampling when the new dataset is further of a smaller size. In this paper, for the first time, we propose a framework to learn a universal deep neural network for undersampled MRI reconstruction. Specifically, anatomy-specific instance normalization is proposed to compensate for statistical shift and allow easy generalization to new datasets. Moreover, the universal model is trained by distilling knowledge from available independent models to further exploit representations across anatomies. Experimental results show the proposed universal model can reconstruct both brain and knee images with high image quality. Also, it is easy to adapt the trained model to new datasets of smaller size, i.e., abdomen, cardiac and prostate, with little effort and superior performance.
Detecting anatomical landmarks in medical images plays an essential role in understanding the anatomy and planning automated processing. In recent years, a variety of deep neural network methods have been developed to detect landmarks automatically. However, all of those methods are unary in the sense that a highly specialized network is trained for a single task say associated with a particular anatomical region. In this work, for the first time, we investigate the idea of "You Only Learn Once (YOLO)" and develop a universal anatomical landmark detection model to realize multiple landmark detection tasks with end-to-end training based on mixed datasets. The model consists of a local network and a global network: The local network is built upon the idea of universal UNet to learn multi-domain local features and the global network is a parallelly-duplicated sequential of dilated convolutions that extract global features to further disambiguate the landmark locations. It is worth mentioning that the new model design requires fewer parameters than models with standard convolutions to train. We evaluate our YOLO model on three X-ray datasets of 1,588 images on the head, hand, and chest, collectively contributing 62 landmarks. The experimental results show that our proposed universal model behaves largely better than any previous models trained on multiple datasets. It even beats the performance of the model that is trained separately for every single dataset.
Recently, both supervised and unsupervised deep learning methods have been widely applied on the CT metal artifact reduction (MAR) task. Supervised methods such as Dual Domain Network (Du-DoNet) work well on simulation data; however, their performance on clinical data is limited due to domain gap. Unsupervised methods are more generalized, but do not eliminate artifacts completely through the sole processing on the image domain. To combine the advantages of both MAR methods, we propose an unpaired dual-domain network (U-DuDoNet) trained using unpaired data. Unlike the artifact disentanglement network (ADN) that utilizes multiple encoders and decoders for disentangling content from artifact, our U-DuDoNet directly models the artifact generation process through additions in both sinogram and image domains, which is theoretically justified by an additive property associated with metal artifact. Our design includes a self-learned sinogram prior net, which provides guidance for restoring the information in the sinogram domain, and cyclic constraints for artifact reduction and addition on unpaired data. Extensive experiments on simulation data and clinical images demonstrate that our novel framework outperforms the state-of-the-art unpaired approaches.
The success of deep learning methods relies on the availability of a large number of datasets with annotations; however, curating such datasets is burdensome, especially for medical images. To relieve such a burden for a landmark detection task, we explore the feasibility of using only a single annotated image and propose a novel framework named Cascade Comparing to Detect (CC2D) for one-shot landmark detection. CC2D consists of two stages: 1) Self-supervised learning (CC2D-SSL) and 2) Training with pseudo-labels (CC2D-TPL). CC2D-SSL captures the consistent anatomical information in a coarse-to-fine fashion by comparing the cascade feature representations and generates predictions on the training set. CC2D-TPL further improves the performance by training a new landmark detector with those predictions. The effectiveness of CC2D is evaluated on a widely-used public dataset of cephalometric landmark detection, which achieves a competitive detection accuracy of 81.01\% within 4.0mm, comparable to the state-of-the-art fully-supervised methods using a lot more than one training image.