Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of full volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of seven abdominal structures (artery, vein, liver, spleen, stomach, gallbladder, and pancreas) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training organ-specific models. To this end, we propose a two-stage, coarse-to-fine approach that trains an FCN model to roughly delineate the organs of interest in the first stage (seeing $\sim$40% of the voxels within a simple, automatically generated binary mask of the patient's body). We then use these predictions of the first-stage FCN to define a candidate region that will be used to train a second FCN. This step reduces the number of voxels the FCN has to classify to $\sim$10% while maintaining a recall high of $>$99%. This second-stage FCN can now focus on more detailed segmentation of the organs. We respectively utilize training and validation sets consisting of 281 and 50 clinical CT images. Our hierarchical approach provides an improved Dice score of 7.5 percentage points per organ on average in our validation set. We furthermore test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans with three anatomical labels (liver, spleen, and pancreas). In such challenging organs as the pancreas, our hierarchical approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset.
This paper presents an end-to-end pixelwise fully automated segmentation of the head sectioned images of the Visible Korean Human (VKH) project based on Deep Convolutional Neural Networks (DCNNs). By converting classification networks into Fully Convolutional Networks (FCNs), a coarse prediction map, with smaller size than the original input image, can be created for segmentation purposes. To refine this map and to obtain a dense pixel-wise output, standard FCNs use deconvolution layers to upsample the coarse map. However, upsampling based on deconvolution increases the number of network parameters and causes loss of detail because of interpolation. On the other hand, dilated convolution is a new technique introduced recently that attempts to capture multi-scale contextual information without increasing the network parameters while keeping the resolution of the prediction maps high. We used both a standard FCN and a dilated convolution based FCN for semantic segmentation of the head sectioned images of the VKH dataset. Quantitative results showed approximately 20% improvement in the segmentation accuracy when using FCNs with dilated convolutions.
Computational anatomy allows the quantitative analysis of organs in medical images. However, most analysis is constrained to the millimeter scale because of the limited resolution of clinical computed tomography (CT). X-ray microtomography ($\mu$CT) on the other hand allows imaging of ex-vivo tissues at a resolution of tens of microns. In this work, we use clinical CT to image lung cancer patients before partial pneumonectomy (resection of pathological lung tissue). The resected specimen is prepared for $\mu$CT imaging at a voxel resolution of 50 $\mu$m (0.05 mm). This high-resolution image of the lung cancer tissue allows further insides into understanding of tumor growth and categorization. For making full use of this additional information, image fusion (registration) needs to be performed in order to re-align the $\mu$CT image with clinical CT. We developed a multi-scale non-rigid registration approach. After manual initialization using a few landmark points and rigid alignment, several levels of non-rigid registration between down-sampled (in the case of $\mu$CT) and up-sampled (in the case of clinical CT) representations of the image are performed. Any non-lung tissue is ignored during the computation of the similarity measure used to guide the registration during optimization. We are able to recover the volume differences introduced by the resection and preparation of the lung specimen. The average ($\pm$ std. dev.) minimum surface distance between $\mu$CT and clinical CT at the resected lung surface is reduced from 3.3 $\pm$ 2.9 (range: [0.1, 15.9]) to 2.3 mm $\pm$ 2.8 (range: [0.0, 15.3]) mm. The alignment of clinical CT with $\mu$CT will allow further registration with even finer resolutions of $\mu$CT (up to 10 $\mu$m resolution) and ultimately with histopathological microscopy images for further macro to micro image fusion that can aid medical image analysis.
Accurate and automatic organ segmentation from 3D radiological scans is an important yet challenging problem for medical image analysis. Specifically, the pancreas demonstrates very high inter-patient anatomical variability in both its shape and volume. In this paper, we present an automated system using 3D computed tomography (CT) volumes via a two-stage cascaded approach: pancreas localization and segmentation. For the first step, we localize the pancreas from the entire 3D CT scan, providing a reliable bounding box for the more refined segmentation step. We introduce a fully deep-learning approach, based on an efficient application of holistically-nested convolutional networks (HNNs) on the three orthogonal axial, sagittal, and coronal views. The resulting HNN per-pixel probability maps are then fused using pooling to reliably produce a 3D bounding box of the pancreas that maximizes the recall. We show that our introduced localizer compares favorably to both a conventional non-deep-learning method and a recent hybrid approach based on spatial aggregation of superpixels using random forest classification. The second, segmentation, phase operates within the computed bounding box and integrates semantic mid-level cues of deeply-learned organ interior and boundary maps, obtained by two additional and separate realizations of HNNs. By integrating these two mid-level cues, our method is capable of generating boundary-preserving pixel-wise class label maps that result in the final pancreas segmentation. Quantitative evaluation is performed on a publicly available dataset of 82 patient CT scans using 4-fold cross-validation (CV). We achieve a Dice similarity coefficient (DSC) of 81.27+/-6.27% in validation, which significantly outperforms previous state-of-the art methods that report DSCs of 71.80+/-10.70% and 78.01+/-8.20%, respectively, using the same dataset.
Accurate automatic organ segmentation is an important yet challenging problem for medical image analysis. The pancreas is an abdominal organ with very high anatomical variability. This inhibits traditional segmentation methods from achieving high accuracies, especially compared to other organs such as the liver, heart or kidneys. In this paper, we present a holistic learning approach that integrates semantic mid-level cues of deeply-learned organ interior and boundary maps via robust spatial aggregation using random forest. Our method generates boundary preserving pixel-wise class labels for pancreas segmentation. Quantitative evaluation is performed on CT scans of 82 patients in 4-fold cross-validation. We achieve a (mean $\pm$ std. dev.) Dice Similarity Coefficient of 78.01% $\pm$ 8.2% in testing which significantly outperforms the previous state-of-the-art approach of 71.8% $\pm$ 10.7% under the same evaluation criterion.
Robust automated organ segmentation is a prerequisite for computer-aided diagnosis (CAD), quantitative imaging analysis and surgical assistance. For high-variability organs such as the pancreas, previous approaches report undesirably low accuracies. We present a bottom-up approach for pancreas segmentation in abdominal CT scans that is based on a hierarchy of information propagation by classifying image patches at different resolutions; and cascading superpixels. There are four stages: 1) decomposing CT slice images as a set of disjoint boundary-preserving superpixels; 2) computing pancreas class probability maps via dense patch labeling; 3) classifying superpixels by pooling both intensity and probability features to form empirical statistics in cascaded random forest frameworks; and 4) simple connectivity based post-processing. The dense image patch labeling are conducted by: efficient random forest classifier on image histogram, location and texture features; and more expensive (but with better specificity) deep convolutional neural network classification on larger image windows (with more spatial contexts). Evaluation of the approach is performed on a database of 80 manually segmented CT volumes in six-fold cross-validation (CV). Our achieved results are comparable, or better than the state-of-the-art methods (evaluated by "leave-one-patient-out"), with Dice 70.7% and Jaccard 57.9%. The computational efficiency has been drastically improved in the order of 6~8 minutes, comparing with others of ~10 hours per case. Finally, we implement a multi-atlas label fusion (MALF) approach for pancreas segmentation using the same datasets. Under six-fold CV, our bottom-up segmentation method significantly outperforms its MALF counterpart: (70.7 +/- 13.0%) versus (52.5 +/- 20.8%) in Dice. Deep CNN patch labeling confidences offer more numerical stability, reflected by smaller standard deviations.
Remarkable progress has been made in image recognition, primarily due to the availability of large-scale annotated datasets and the revival of deep CNN. CNNs enable learning data-driven, highly representative, layered hierarchical image features from sufficient training data. However, obtaining datasets as comprehensively annotated as ImageNet in the medical imaging domain remains a challenge. There are currently three major techniques that successfully employ CNNs to medical image classification: training the CNN from scratch, using off-the-shelf pre-trained CNN features, and conducting unsupervised CNN pre-training with supervised fine-tuning. Another effective method is transfer learning, i.e., fine-tuning CNN models pre-trained from natural image dataset to medical image tasks. In this paper, we exploit three important, but previously understudied factors of employing deep convolutional neural networks to computer-aided detection problems. We first explore and evaluate different CNN architectures. The studied models contain 5 thousand to 160 million parameters, and vary in numbers of layers. We then evaluate the influence of dataset scale and spatial image context on performance. Finally, we examine when and why transfer learning from pre-trained ImageNet (via fine-tuning) can be useful. We study two specific computer-aided detection (CADe) problems, namely thoraco-abdominal lymph node (LN) detection and interstitial lung disease (ILD) classification. We achieve the state-of-the-art performance on the mediastinal LN detection, with 85% sensitivity at 3 false positive per patient, and report the first five-fold cross-validation classification results on predicting axial CT slices with ILD categories. Our extensive empirical evaluation, CNN model analysis and valuable insights can be extended to the design of high performance CAD systems for other medical imaging tasks.
Accurate spine segmentation allows for improved identification and quantitative characterization of abnormalities of the vertebra, such as vertebral fractures. However, in existing automated vertebra segmentation methods on computed tomography (CT) images, leakage into nearby bones such as ribs occurs due to the close proximity of these visibly intense structures in a 3D CT volume. To reduce this error, we propose the use of joint vertebra-rib atlases to improve the segmentation of vertebrae via multi-atlas joint label fusion. Segmentation was performed and evaluated on CTs containing 106 thoracic and lumbar vertebrae from 10 pathological and traumatic spine patients on an individual vertebra level basis. Vertebra atlases produced errors where the segmentation leaked into the ribs. The use of joint vertebra-rib atlases produced a statistically significant increase in the Dice coefficient from 92.5 $\pm$ 3.1% to 93.8 $\pm$ 2.1% for the left and right transverse processes and a decrease in the mean and max surface distance from 0.75 $\pm$ 0.60mm and 8.63 $\pm$ 4.44mm to 0.30 $\pm$ 0.27mm and 3.65 $\pm$ 2.87mm, respectively.
Injuries of the spine, and its posterior elements in particular, are a common occurrence in trauma patients, with potentially devastating consequences. Computer-aided detection (CADe) could assist in the detection and classification of spine fractures. Furthermore, CAD could help assess the stability and chronicity of fractures, as well as facilitate research into optimization of treatment paradigms. In this work, we apply deep convolutional networks (ConvNets) for the automated detection of posterior element fractures of the spine. First, the vertebra bodies of the spine with its posterior elements are segmented in spine CT using multi-atlas label fusion. Then, edge maps of the posterior elements are computed. These edge maps serve as candidate regions for predicting a set of probabilities for fractures along the image edges using ConvNets in a 2.5D fashion (three orthogonal patches in axial, coronal and sagittal planes). We explore three different methods for training the ConvNet using 2.5D patches along the edge maps of 'positive', i.e. fractured posterior-elements and 'negative', i.e. non-fractured elements. An experienced radiologist retrospectively marked the location of 55 displaced posterior-element fractures in 18 trauma patients. We randomly split the data into training and testing cases. In testing, we achieve an area-under-the-curve of 0.857. This corresponds to 71% or 81% sensitivities at 5 or 10 false-positives per patient, respectively. Analysis of our set of trauma patients demonstrates the feasibility of detecting posterior-element fractures in spine CT images using computer vision techniques such as deep convolutional networks.
The precise and accurate segmentation of the vertebral column is essential in the diagnosis and treatment of various orthopedic, neurological, and oncological traumas and pathologies. Segmentation is especially challenging in the presence of pathology such as vertebral compression fractures. In this paper, we propose a method to produce segmentations for osteoporotic compression fractured vertebrae by applying a multi-atlas joint label fusion technique for clinical CT images. A total of 170 thoracic and lumbar vertebrae were evaluated using atlases from five patients with varying degrees of spinal degeneration. In an osteoporotic cohort of bundled atlases, registration provided an average Dice coefficient and mean absolute surface distance of 2.7$\pm$4.5% and 0.32$\pm$0.13mm for osteoporotic vertebrae, respectively, and 90.9$\pm$3.0% and 0.36$\pm$0.11mm for compression fractured vertebrae.