Breast cancer screening is one of the most common radiological tasks with over 39 million exams performed each year. While breast cancer screening has been one of the most studied medical imaging applications of artificial intelligence, the development and evaluation of the algorithms are hindered due to the lack of well-annotated large-scale publicly available datasets. This is particularly an issue for digital breast tomosynthesis (DBT) which is a relatively new breast cancer screening modality. We have curated and made publicly available a large-scale dataset of digital breast tomosynthesis images. It contains 22,032 reconstructed DBT volumes belonging to 5,610 studies from 5,060 patients. This included four groups: (1) 5,129 normal studies, (2) 280 studies where additional imaging was needed but no biopsy was performed, (3) 112 benign biopsied studies, and (4) 89 studies with cancer. Our dataset included masses and architectural distortions which were annotated by two experienced radiologists. Additionally, we developed a single-phase deep learning detection model and tested it using our dataset to serve as a baseline for future research. Our model reached a sensitivity of 65% at 2 false positives per breast. Our large, diverse, and highly-curated dataset will facilitate development and evaluation of AI algorithms for breast cancer screening through providing data for training as well as common set of cases for model validation. The performance of the model developed in our study shows that the task remains challenging and will serve as a baseline for future model development.
Mammography is the primary imaging modality used for early detection and diagnosis of breast cancer. Mammography analysis mainly refers to the extraction of regions of interest around tumors, followed by a segmentation step, which is essential to further classification of benign or malignant tumors. Breast masses are the most important findings among breast abnormalities. However, manual delineation of masses from native mammogram is a time consuming and error-prone task. An integrated computer-aided diagnosis system to assist radiologists in automatically detecting and segmenting breast masses is therefore in urgent need. We propose a fully-automated approach that guides accurate mass segmentation from full mammograms at high resolution through a detection stage. First, mass detection is performed by an efficient deep learning approach, You-Only-Look-Once, extended by integrating multi-scale predictions to improve automatic candidate selection. Second, a convolutional encoder-decoder network using nested and dense skip connections is employed to fine-delineate candidate masses. Unlike most previous studies based on segmentation from regions, our framework handles mass segmentation from native full mammograms without user intervention. Trained on INbreast and DDSM-CBIS public datasets, the pipeline achieves an overall average Dice of 80.44% on high-resolution INbreast test images, outperforming state-of-the-art methods. Our system shows promising accuracy as an automatic full-image mass segmentation system. The comprehensive evaluation provided for both detection and segmentation stages reveals strong robustness to the diversity of size, shape and appearance of breast masses, towards better computer-aided diagnosis.
Detection of pulmonary nodules in chest CT imaging plays a crucial role in early diagnosis of lung cancer. Manual examination is highly time-consuming and error prone, calling for computer-aided detection, both to improve efficiency and reduce misdiagnosis. Over the years, a range of systems have been proposed, mostly following a two-phase paradigm with: 1) candidate detection, 2) false positive reduction. Recently, deep learning has become a dominant force in algorithm development. As for candidate detection, prior art was mainly based on the two-stage Faster R-CNN framework, which starts with an initial sub-net to generate a set of class-agnostic region proposals, followed by a second sub-net to perform classification and bounding-box regression. In contrast, we abandon the conventional two-phase paradigm and two-stage framework altogether and propose to train a single network for end-to-end nodule detection instead, without transfer learning or further post-processing. Our feature learning model is a modification of the ResNet and feature pyramid network combined, powered by RReLU activation. The major challenge is the condition of extreme inter-class and intra-class sample imbalance, where the positives are overwhelmed by a large negative pool, which is mostly composed of easy and a handful of hard negatives. Direct training on all samples can seriously undermine training efficacy. We propose a patch-based sampling strategy over a set of regularly updating anchors, which narrows sampling scope to all positives and only hard negatives, effectively addressing this issue. As a result, our approach substantially outperforms prior art in terms of both accuracy and speed. Finally, the prevailing FROC evaluation over [1/8, 1/4, 1/2, 1, 2, 4, 8] false positives per scan, is far from ideal in real clinical environments. We suggest FROC over [1, 2, 4] false positives as a better metric.
Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans. Although many deep learning-based algorithms make great progress for improving the accuracy of nodule detection, the high false positive rate is still a challenging problem which limits the automatic diagnosis in routine clinical practice. Moreover, the CT scans collected from multiple manufacturers may affect the robustness of Computer-aided diagnosis (CAD) due to the differences in intensity scales and machine noises. In this paper, we propose a novel self-supervised learning assisted pulmonary nodule detection framework based on a 3D Feature Pyramid Network (3DFPN) to improve the sensitivity of nodule detection by employing multi-scale features to increase the resolution of nodules, as well as a parallel top-down path to transit the high-level semantic features to complement low-level general features. Furthermore, a High Sensitivity and Specificity (HS2) network is introduced to eliminate the false positive nodule candidates by tracking the appearance changes in continuous CT slices of each nodule candidate on Location History Images (LHI). In addition, in order to improve the performance consistency of the proposed framework across data captured by different CT scanners without using additional annotations, an effective self-supervised learning schema is applied to learn spatiotemporal features of CT scans from large-scale unlabeled data. The performance and robustness of our method are evaluated on several publicly available datasets with significant performance improvements. The proposed framework is able to accurately detect pulmonary nodules with high sensitivity and specificity and achieves 90.6% sensitivity with 1/8 false positive per scan which outperforms the state-of-the-art results 15.8% on LUNA16 dataset.
The large number of trainable parameters of deep neural networks renders them inherently data hungry. This characteristic heavily challenges the medical imaging community and to make things even worse, many imaging modalities are ambiguous in nature leading to rater-dependant annotations that current loss formulations fail to capture. We propose employing adversarial training for segmentation networks in order to alleviate aforementioned problems. We learn to segment aggressive prostate cancer utilizing challenging MRI images of 152 patients and show that the proposed scheme is superior over the de facto standard in terms of the detection sensitivity and the dice-score for aggressive prostate cancer. The achieved relative gains are shown to be particularly pronounced in the small dataset limit.
Response of breast cancer to neoadjuvant chemotherapy (NAC) can be monitored using the change in visible tumor on magnetic resonance imaging (MRI). In our current workflow, seed points are manually placed in areas of enhancement likely to contain cancer. A constrained volume growing method uses these manually placed seed points as input and generates a tumor segmentation. This method is rigorously validated using complete pathological embedding. In this study, we propose to exploit deep learning for fast and automatic seed point detection, replacing manual seed point placement in our existing and well-validated workflow. The seed point generator was developed in early breast cancer patients with pathology-proven segmentations (N=100), operated shortly after MRI. It consisted of an ensemble of three independently trained fully convolutional dilated neural networks that classified breast voxels as tumor or non-tumor. Subsequently, local maxima were used as seed points for volume growing in patients receiving NAC (N=10). The percentage of tumor volume change was evaluated against semi-automatic segmentations. The primary cancer was localized in 95% of the tumors at the cost of 0.9 false positive per patient. False positives included focally enhancing regions of unknown origin and parts of the intramammary blood vessels. Volume growing from the seed points showed a median tumor volume decrease of 70% (interquartile range: 50%-77%), comparable to the semi-automatic segmentations (median: 70%, interquartile range 23%-76%). To conclude, a fast and automatic seed point generator was developed, fully automating a well-validated semi-automatic workflow for response monitoring of breast cancer to neoadjuvant chemotherapy.
Automatic and accurate lung nodule detection from 3D Computed Tomography scans plays a vital role in efficient lung cancer screening. Despite the state-of-the-art performance obtained by recent anchor-based detectors using Convolutional Neural Networks, they require predetermined anchor parameters such as the size, number, and aspect ratio of anchors, and have limited robustness when dealing with lung nodules with a massive variety of sizes. We propose a 3D sphere representation-based center-points matching detection network (SCPM-Net) that is anchor-free and automatically predicts the position, radius, and offset of nodules without the manual design of nodule/anchor parameters. The SCPM-Net consists of two novel pillars: sphere representation and center points matching. To mimic the nodule annotation in clinical practice, we replace the conventional bounding box with the newly proposed bounding sphere. A compatible sphere-based intersection over-union loss function is introduced to train the lung nodule detection network stably and efficiently.We empower the network anchor-free by designing a positive center-points selection and matching (CPM) process, which naturally discards pre-determined anchor boxes. An online hard example mining and re-focal loss subsequently enable the CPM process more robust, resulting in more accurate point assignment and the mitigation of class imbalance. In addition, to better capture spatial information and 3D context for the detection, we propose to fuse multi-level spatial coordinate maps with the feature extractor and combine them with 3D squeeze-and-excitation attention modules. Experimental results on the LUNA16 dataset showed that our proposed SCPM-Net framework achieves superior performance compared with existing used anchor-based and anchor-free methods for lung nodule detection.
We present a computer-aided detection algorithm for polyps in optical colonoscopy images. Polyps are the precursors to colon cancer. In the US alone, more than 14 million optical colonoscopies are performed every year, mostly to screen for polyps. Optical colonoscopy has been shown to have an approximately 25% polyp miss rate due to the convoluted folds and bends present in the colon. In this work, we present an automatic detection algorithm to detect these polyps in the optical colonoscopy images. We use a machine learning algorithm to infer a depth map for a given optical colonoscopy image and then use a detailed pre-built polyp profile to detect and delineate the boundaries of polyps in this given image. We have achieved the best recall of 84.0% and the best specificity value of 83.4%.
Histopathology images are crucial to the study of complex diseases such as cancer. The histologic characteristics of nuclei play a key role in disease diagnosis, prognosis and analysis. In this work, we propose a sparse Convolutional Autoencoder (CAE) for fully unsupervised, simultaneous nucleus detection and feature extraction in histopathology tissue images. Our CAE detects and encodes nuclei in image patches in tissue images into sparse feature maps that encode both the location and appearance of nuclei. Our CAE is the first unsupervised detection network for computer vision applications. The pretrained nucleus detection and feature extraction modules in our CAE can be fine-tuned for supervised learning in an end-to-end fashion. We evaluate our method on four datasets and reduce the errors of state-of-the-art methods up to 42%. We are able to achieve comparable performance with only 5% of the fully-supervised annotation cost.
Breast cancer is in the most common malignant tumor in women. It accounted for 30% of new malignant tumor cases. Although the incidence of breast cancer remains high around the world, the mortality rate has been continuously reduced. This is mainly due to recent developments in molecular biology technology and improved level of comprehensive diagnosis and standard treatment. Early detection by mammography is an integral part of that. The most common breast abnormalities that may indicate breast cancer are masses and calcifications. Previous detection approaches usually obtain relatively high sensitivity but unsatisfactory specificity. We will investigate an approach that applies the discrete wavelet transform and Fourier transform to parse the images and extracts statistical features that characterize an image's content, such as the mean intensity and the skewness of the intensity. A naive Bayesian classifier uses these features to classify the images. We expect to achieve an optimal high specificity.