This report addresses the technical aspects of de-identification of medical images of human subjects and biospecimens, such that re-identification risk of ethical, moral, and legal concern is sufficiently reduced to allow unrestricted public sharing for any purpose, regardless of the jurisdiction of the source and distribution sites. All medical images, regardless of the mode of acquisition, are considered, though the primary emphasis is on those with accompanying data elements, especially those encoded in formats in which the data elements are embedded, particularly Digital Imaging and Communications in Medicine (DICOM). These images include image-like objects such as Segmentations, Parametric Maps, and Radiotherapy (RT) Dose objects. The scope also includes related non-image objects, such as RT Structure Sets, Plans and Dose Volume Histograms, Structured Reports, and Presentation States. Only de-identification of publicly released data is considered, and alternative approaches to privacy preservation, such as federated learning for artificial intelligence (AI) model development, are out of scope, as are issues of privacy leakage from AI model sharing. Only technical issues of public sharing are addressed.
Artificial intelligence (AI) in radiology has made great strides in recent years, but many hurdles remain. Overfitting and lack of generalizability represent important ongoing challenges hindering accurate and dependable clinical deployment. If AI algorithms can avoid overfitting and achieve true generalizability, they can go from the research realm to the forefront of clinical work. Recently, small data AI approaches such as deep neuroevolution (DNE) have avoided overfitting small training sets. We seek to address both overfitting and generalizability by applying DNE to a virtually pooled data set consisting of images from various institutions. Our use case is classifying neuroblastoma brain metastases on MRI. Neuroblastoma is well-suited for our goals because it is a rare cancer. Hence, studying this pediatric disease requires a small data approach. As a tertiary care center, the neuroblastoma images in our local Picture Archiving and Communication System (PACS) are largely from outside institutions. These multi-institutional images provide a heterogeneous data set that can simulate real world clinical deployment. As in prior DNE work, we used a small training set, consisting of 30 normal and 30 metastasis-containing post-contrast MRI brain scans, with 37% outside images. The testing set was enriched with 83% outside images. DNE converged to a testing set accuracy of 97%. Hence, the algorithm was able to predict image class with near-perfect accuracy on a testing set that simulates real-world data. Hence, the work described here represents a considerable contribution toward clinically feasible AI.
Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melanoma Classification challenge dataset described herein was constructed to address this discrepancy between prior challenges and clinical practice, providing for each image in the dataset an identifier allowing lesions from the same patient to be mapped to one another. This patient-level contextual information is frequently used by clinicians to diagnose melanoma and is especially useful in ruling out false positives in patients with many atypical nevi. The dataset represents 2,056 patients from three continents with an average of 16 lesions per patient, consisting of 33,126 dermoscopic images and 584 histopathologically confirmed melanomas compared with benign melanoma mimickers.
This work summarizes the results of the largest skin image analysis challenge in the world, hosted by the International Skin Imaging Collaboration (ISIC), a global partnership that has organized the world's largest public repository of dermoscopic images of skin. The challenge was hosted in 2018 at the Medical Image Computing and Computer Assisted Intervention (MICCAI) conference in Granada, Spain. The dataset included over 12,500 images across 3 tasks. 900 users registered for data download, 115 submitted to the lesion segmentation task, 25 submitted to the lesion attribute detection task, and 159 submitted to the disease classification task. Novel evaluation protocols were established, including a new test for segmentation algorithm performance, and a test for algorithm ability to generalize. Results show that top segmentation algorithms still fail on over 10% of images on average, and algorithms with equal performance on test data can have different abilities to generalize. This is an important consideration for agencies regulating the growing set of machine learning tools in the healthcare domain, and sets a new standard for future public challenges in healthcare.
This article describes the design, implementation, and results of the latest installment of the dermoscopic image analysis benchmark challenge. The goal is to support research and development of algorithms for automated diagnosis of melanoma, the most lethal skin cancer. The challenge was divided into 3 tasks: lesion segmentation, feature detection, and disease classification. Participation involved 593 registrations, 81 pre-submissions, 46 finalized submissions (including a 4-page manuscript), and approximately 50 attendees, making this the largest standardized and comparative study in this field to date. While the official challenge duration and ranking of participants has concluded, the dataset snapshots remain available for further research and development.
Melanoma is the deadliest form of skin cancer. While curable with early detection, only highly trained specialists are capable of accurately recognizing the disease. As expertise is in limited supply, automated systems capable of identifying disease could save lives, reduce unnecessary biopsies, and reduce costs. Toward this goal, we propose a system that combines recent developments in deep learning with established machine learning approaches, creating ensembles of methods that are capable of segmenting skin lesions, as well as analyzing the detected area and surrounding tissue for melanoma detection. The system is evaluated using the largest publicly available benchmark dataset of dermoscopic images, containing 900 training and 379 testing images. New state-of-the-art performance levels are demonstrated, leading to an improvement in the area under receiver operating characteristic curve of 7.5% (0.843 vs. 0.783), in average precision of 4% (0.649 vs. 0.624), and in specificity measured at the clinically relevant 95% sensitivity operating point 2.9 times higher than the previous state-of-the-art (36.8% specificity compared to 12.5%). Compared to the average of 8 expert dermatologists on a subset of 100 test images, the proposed system produces a higher accuracy (76% vs. 70.5%), and specificity (62% vs. 59%) evaluated at an equivalent sensitivity (82%).
In this article, we describe the design and implementation of a publicly accessible dermatology image analysis benchmark challenge. The goal of the challenge is to sup- port research and development of algorithms for automated diagnosis of melanoma, a lethal form of skin cancer, from dermoscopic images. The challenge was divided into sub-challenges for each task involved in image analysis, including lesion segmentation, dermoscopic feature detection within a lesion, and classification of melanoma. Training data included 900 images. A separate test dataset of 379 images was provided to measure resultant performance of systems developed with the training data. Ground truth for both training and test sets was generated by a panel of dermoscopic experts. In total, there were 79 submissions from a group of 38 participants, making this the largest standardized and comparative study for melanoma diagnosis in dermoscopic images to date. While the official challenge duration and ranking of participants has concluded, the datasets remain available for further research and development.