Deep learning has achieved widespread success in medical image analysis, leading to an increasing demand for large-scale expert-annotated medical image datasets. Yet, the high cost of annotating medical images severely hampers the development of deep learning in this field. To reduce annotation costs, active learning aims to select the most informative samples for annotation and train high-performance models with as few labeled samples as possible. In this survey, we review the core methods of active learning, including the evaluation of informativeness and sampling strategy. For the first time, we provide a detailed summary of the integration of active learning with other label-efficient techniques, such as semi-supervised, self-supervised learning, and so on. Additionally, we also highlight active learning works that are specifically tailored to medical image analysis. In the end, we offer our perspectives on the future trends and challenges of active learning and its applications in medical image analysis.
The prediction of mild cognitive impairment (MCI) conversion to Alzheimer's disease (AD) is important for early treatment to prevent or slow the progression of AD. To accurately predict the MCI conversion to stable MCI or progressive MCI, we propose Triformer, a novel transformer-based framework with three specialized transformers to incorporate multi-model data. Triformer uses I) an image transformer to extract multi-view image features from medical scans, II) a clinical transformer to embed and correlate multi-modal clinical data, and III) a modality fusion transformer that produces an accurate prediction based on fusing the outputs from the image and clinical transformers. Triformer is evaluated on the Alzheimer's Disease Neuroimaging Initiative (ANDI)1 and ADNI2 datasets and outperforms previous state-of-the-art single and multi-modal methods.
Semantic medical image segmentation using deep learning has recently achieved high accuracy, making it appealing to clinical problems such as radiation therapy. However, the lack of high-quality semantically labelled data remains a challenge leading to model brittleness to small shifts to input data. Most works require extra data for semi-supervised learning and lack the interpretability of the boundaries of the training data distribution during training, which is essential for model deployment in clinical practice. We propose a fully supervised generative framework that can achieve generalisable segmentation with only limited labelled data by simultaneously constructing an explorable manifold during training. The proposed approach creates medical image style paired with a segmentation task driven discriminator incorporating end-to-end adversarial training. The discriminator is generalised to small domain shifts as much as permissible by the training data, and the generator automatically diversifies the training samples using a manifold of input features learnt during segmentation. All the while, the discriminator guides the manifold learning by supervising the semantic content and fine-grained features separately during the image diversification. After training, visualisation of the learnt manifold from the generator is available to interpret the model limits. Experiments on a fully semantic, publicly available pelvis dataset demonstrated that our method is more generalisable to shifts than other state-of-the-art methods while being more explainable using an explorable manifold.
Accurate medical classification requires a large number of multi-modal data, and in many cases, in different formats. Previous studies have shown promising results when using multi-modal data, outperforming single-modality models on when classifying disease such as AD. However, those models are usually not flexible enough to handle missing modalities. Currently, the most common workaround is excluding samples with missing modalities which leads to considerable data under-utilisation. Adding to the fact that labelled medical images are already scarce, the performance of data-driven methods like deep learning is severely hampered. Therefore, a multi-modal method that can gracefully handle missing data in various clinical settings is highly desirable. In this paper, we present the Multi-Modal Mixing Transformer (3MT), a novel Transformer for disease classification based on multi-modal data. In this work, we test it for \ac{AD} or \ac{CN} classification using neuroimaging data, gender, age and MMSE scores. The model uses a novel Cascaded Modality Transformers architecture with cross-attention to incorporate multi-modal information for more informed predictions. Auxiliary outputs and a novel modality dropout mechanism were incorporated to ensure an unprecedented level of modality independence and robustness. The result is a versatile network that enables the mixing of an unlimited number of modalities with different formats and full data utilization. 3MT was first tested on the ADNI dataset and achieved state-of-the-art test accuracy of $0.987\pm0.0006$. To test its generalisability, 3MT was directly applied to the AIBL after training on the ADNI dataset, and achieved a test accuracy of $0.925\pm0.0004$ without fine-tuning. Finally, we show that Grad-CAM visualizations are also possible with our model for explainable results.
In medical image analysis, the subtle visual characteristics of many diseases are challenging to discern, particularly due to the lack of paired data. For example, in mild Alzheimer's Disease (AD), brain tissue atrophy can be difficult to observe from pure imaging data, especially without paired AD and Cognitively Normal ( CN ) data for comparison. This work presents Disease Discovery GAN ( DiDiGAN), a weakly-supervised style-based framework for discovering and visualising subtle disease features. DiDiGAN learns a disease manifold of AD and CN visual characteristics, and the style codes sampled from this manifold are imposed onto an anatomical structural "blueprint" to synthesise paired AD and CN magnetic resonance images (MRIs). To suppress non-disease-related variations between the generated AD and CN pairs, DiDiGAN leverages a structural constraint with cycle consistency and anti-aliasing to enforce anatomical correspondence. When tested on the Alzheimer's Disease Neuroimaging Initiative ( ADNI) dataset, DiDiGAN showed key AD characteristics (reduced hippocampal volume, ventricular enlargement, and atrophy of cortical structures) through synthesising paired AD and CN scans. The qualitative results were backed up by automated brain volume analysis, where systematic pair-wise reductions in brain tissue structures were also measured
Histopathological images contain abundant phenotypic information and pathological patterns, which are the gold standards for disease diagnosis and essential for the prediction of patient prognosis and treatment outcome. In recent years, computer-automated analysis techniques for histopathological images have been urgently required in clinical practice, and deep learning methods represented by convolutional neural networks have gradually become the mainstream in the field of digital pathology. However, obtaining large numbers of fine-grained annotated data in this field is a very expensive and difficult task, which hinders the further development of traditional supervised algorithms based on large numbers of annotated data. More recent studies have started to liberate from the traditional supervised paradigm, and the most representative ones are the studies on weakly supervised learning paradigm based on weak annotation, semi-supervised learning paradigm based on limited annotation, and self-supervised learning paradigm based on pathological image representation learning. These new methods have led a new wave of automatic pathological image diagnosis and analysis targeted at annotation efficiency. With a survey of over 130 papers, we present a comprehensive and systematic review of the latest studies on weakly supervised learning, semi-supervised learning, and self-supervised learning in the field of computational pathology from both technical and methodological perspectives. Finally, we present the key challenges and future trends for these techniques.
The prevalence of suicide has been on the rise since the 20th century, causing severe emotional damage to individuals, families, and communities alike. Despite the severity of this suicide epidemic, there is so far no reliable and systematic way to assess suicide intent of a given individual. Through efforts to automate and systematize diagnosis of mental illnesses over the past few years, verbal and acoustic behaviors have received increasing attention as biomarkers, but little has been done to study eyelids, gaze, and head pose in evaluating suicide risk. This study explores statistical analysis, feature selection, and machine learning classification as means of suicide risk evaluation and nonverbal behavioral interpretation. Applying these methods to the eye and head signals extracted from our unique dataset, this study finds that high-risk suicidal individuals experience psycho-motor retardation and symptoms of anxiety and depression, characterized by eye contact avoidance, slower blinks and a downward eye gaze. By comparing results from different methods of classification, we determined that these features are highly capable of automatically classifying different levels of suicide risk consistently and with high accuracy, above 98%. Our conclusion corroborates psychological studies, and shows great potential of a systematic approach in suicide risk evaluation that is adoptable by both healthcare providers and naive observers.
Direct automatic segmentation of objects from 3D medical imaging, such as magnetic resonance (MR) imaging, is challenging as it often involves accurately identifying a number of individual objects with complex geometries within a large volume under investigation. To address these challenges, most deep learning approaches typically enhance their learning capability by substantially increasing the complexity or the number of trainable parameters within their models. Consequently, these models generally require long inference time on standard workstations operating clinical MR systems and are restricted to high-performance computing hardware due to their large memory requirement. Further, to fit 3D dataset through these large models using limited computer memory, trade-off techniques such as patch-wise training are often used which sacrifice the fine-scale geometric information from input images which could be clinically significant for diagnostic purposes. To address these challenges, we present a compact convolutional neural network with a shallow memory footprint to efficiently reduce the number of model parameters required for state-of-art performance. This is critical for practical employment as most clinical environments only have low-end hardware with limited computing power and memory. The proposed network can maintain data integrity by directly processing large full-size 3D input volumes with no patches required and significantly reduces the computational time required for both training and inference. We also propose a novel loss function with extra shape constraint to improve the accuracy for imbalanced classes in 3D MR images.
Medical image translation (e.g. CT to MR) is a challenging task as it requires I) faithful translation of domain-invariant features (e.g. shape information of anatomical structures) and II) realistic synthesis of target-domain features (e.g. tissue appearance in MR). In this work, we propose Manifold Disentanglement Generative Adversarial Network (MDGAN), a novel image translation framework that explicitly models these two types of features. It employs a fully convolutional generator to model domain-invariant features, and it uses style codes to separately model target-domain features as a manifold. This design aims to explicitly disentangle domain-invariant features and domain-specific features while gaining individual control of both. The image translation process is formulated as a stylisation task, where the input is "stylised" (translated) into diverse target-domain images based on style codes sampled from the learnt manifold. We test MDGAN for multi-modal medical image translation, where we create two domain-specific manifold clusters on the manifold to translate segmentation maps into pseudo-CT and pseudo-MR images, respectively. We show that by traversing a path across the MR manifold cluster, the target output can be manipulated while still retaining the shape information from the input.