Purpose. Early squamous cell neoplasia (ESCN) in the oesophagus is a highly treatable condition. Lesions confined to the mucosal layer can be curatively treated endoscopically. We build a computer-assisted detection (CADe) system that can classify still images or video frames as normal or abnormal with high diagnostic accuracy. Methods. We present a new benchmark dataset containing 68K binary labeled frames extracted from 114 patient videos whose imaged areas have been resected and correlated to histopathology. Our novel convolutional network (CNN) architecture solves the binary classification task and explains what features of the input domain drive the decision-making process of the network. Results. The proposed method achieved an average accuracy of 91.7 % compared to the 94.7 % achieved by a group of 12 senior clinicians. Our novel network architecture produces deeply supervised activation heatmaps that suggest the network is looking at intrapapillary capillary loop (IPCL) patterns when predicting abnormality. Conclusion. We believe that this dataset and baseline method may serve as a reference for future benchmarks on both video frame classification and explainability in the context of ESCN detection. A future work path of high clinical relevance is the extension of the classification to ESCN types.
Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is often a rate-limiting factor when large amounts of labeled images are required, such as for training deep convolutional networks for instrument-background segmentation in surgical scenes. No large datasets comparable to industry standards in the computer vision community are available for this task. To circumvent this problem, we propose to automate the creation of a realistic training dataset by exploiting techniques stemming from special effects and harnessing them to target training performance rather than visual appeal. Foreground data is captured by placing sample surgical instruments over a chroma key (a.k.a. green screen) in a controlled environment, thereby making extraction of the relevant image segment straightforward. Multiple lighting conditions and viewpoints can be captured and introduced in the simulation by moving the instruments and camera and modulating the light source. Background data is captured by collecting videos that do not contain instruments. In the absence of pre-existing instrument-free background videos, minimal labeling effort is required, just to select frames that do not contain surgical instruments from videos of surgical interventions freely available online. We compare different methods to blend instruments over tissue and propose a novel data augmentation approach that takes advantage of the plurality of options. We show that by training a vanilla U-Net on semi-synthetic data only and applying a simple post-processing, we are able to match the results of the same network trained on a publicly available manually labeled real dataset.
Imaging devices exploit the Nyquist-Shannon sampling theorem to avoid both aliasing and redundant oversampling by design. Conversely, in medical image resampling, images are considered as continuous functions, are warped by a spatial transformation, and are then sampled on a regular grid. In most cases, the spatial warping changes the frequency characteristics of the continuous function and no special care is taken to ensure that the resampling grid respects the conditions of the sampling theorem. This paper shows that this oversight introduces artefacts, including aliasing, that can lead to important bias in clinical applications. One notable exception to this common practice is when multi-resolution pyramids are constructed, with low-pass "anti-aliasing" filters being applied prior to downsampling. In this work, we illustrate why similar caution is needed when resampling images under general spatial transformations and propose a novel method that is more respectful of the sampling theorem, minimising aliasing and loss of information. We introduce the notion of scale factor point spread function (sfPSF) and employ Gaussian kernels to achieve a computationally tractable resampling scheme that can cope with arbitrary non-linear spatial transformations and grid sizes. Experiments demonstrate significant (p<1e-4) technical and clinical implications of the proposed method.
Video mosaicking requires the registration of overlapping frames located at distant timepoints in the sequence to ensure global consistency of the reconstructed scene. However, fully automated registration of such long-range pairs is (i) challenging when the registration of images itself is difficult; and (ii) computationally expensive for long sequences due to the large number of candidate pairs for registration. In this paper, we introduce an efficient framework for the active annotation of long-range pairwise correspondences in a sequence. Our framework suggests pairs of images that are sought to be informative to an oracle agent (e.g., a human user, or a reliable matching algorithm) who provides visual correspondences on each suggested pair. Informative pairs are retrieved according to an iterative strategy based on a principled annotation reward coupled with two complementary and online adaptable models of frame overlap. In addition to the efficient construction of a mosaic, our framework provides, as a by-product, ground truth landmark correspondences which can be used for evaluation or learning purposes. We evaluate our approach in both automated and interactive scenarios via experiments on synthetic sequences, on a publicly available dataset for aerial imaging and on a clinical dataset for placenta mosaicking during fetal surgery.
Training a deep neural network is an optimization problem with four main ingredients: the design of the deep neural network, the per-sample loss function, the population loss function, and the optimizer. However, methods developed to compete in recent BraTS challenges tend to focus only on the design of deep neural network architectures, while paying less attention to the three other aspects. In this paper, we experimented with adopting the opposite approach. We stuck to a generic and state-of-the-art 3D U-Net architecture and experimented with a non-standard per-sample loss function, the generalized Wasserstein Dice loss, a non-standard population loss function, corresponding to distributionally robust optimization, and a non-standard optimizer, Ranger. Those variations were selected specifically for the problem of multi-class brain tumor segmentation. The generalized Wasserstein Dice loss is a per-sample loss function that allows taking advantage of the hierarchical structure of the tumor regions labeled in BraTS. Distributionally robust optimization is a generalization of empirical risk minimization that accounts for the presence of underrepresented subdomains in the training dataset. Ranger is a generalization of the widely used Adam optimizer that is more stable with small batch size and noisy labels. We found that each of those variations of the optimization of deep neural networks for brain tumor segmentation leads to improvements in terms of Dice scores and Hausdorff distances. With an ensemble of three deep neural networks trained with various optimization procedures, we achieved promising results on the validation dataset of the BraTS 2020 challenge. Our ensemble ranked fourth out of the 693 registered teams for the segmentation task of the BraTS 2020 challenge.
Many atlases used for brain parcellation are hierarchically organised, progressively dividing the brain into smaller sub-regions. However, state-of-the-art parcellation methods tend to ignore this structure and treat labels as if they are `flat'. We introduce a hierarchically-aware brain parcellation method that works by predicting the decisions at each branch in the label tree. We further show how this method can be used to model uncertainty separately for every branch in this label tree. Our method exceeds the performance of flat uncertainty methods, whilst also providing decomposed uncertainty estimates that enable us to obtain self-consistent parcellations and uncertainty maps at any level of the label hierarchy. We demonstrate a simple way these decision-specific uncertainty maps may be used to provided uncertainty-thresholded tissue maps at any level of the label tree.
Brain tissue segmentation from multimodal MRI is a key building block of many neuroimaging analysis pipelines. Established tissue segmentation approaches have, however, not been developed to cope with large anatomical changes resulting from pathology, such as white matter lesions or tumours, and often fail in these cases. In the meantime, with the advent of deep neural networks (DNNs), segmentation of brain lesions has matured significantly. However, few existing approaches allow for the joint segmentation of normal tissue and brain lesions. Developing a DNN for such a joint task is currently hampered by the fact that annotated datasets typically address only one specific task and rely on task-specific imaging protocols including a task-specific set of imaging modalities. In this work, we propose a novel approach to build a joint tissue and lesion segmentation model from aggregated task-specific hetero-modal domain-shifted and partially-annotated datasets. Starting from a variational formulation of the joint problem, we show how the expected risk can be decomposed and optimised empirically. We exploit an upper bound of the risk to deal with heterogeneous imaging modalities across datasets. To deal with potential domain shift, we integrated and tested three conventional techniques based on data augmentation, adversarial learning and pseudo-healthy generation. For each individual task, our joint approach reaches comparable performance to task-specific and fully-supervised models. The proposed framework is assessed on two different types of brain lesions: White matter lesions and gliomas. In the latter case, lacking a joint ground-truth for quantitative assessment purposes, we propose and use a novel clinically-relevant qualitative assessment methodology.
Accurate local fiber orientation distribution (FOD) modeling based on diffusion magnetic resonance imaging (dMRI) capable of resolving complex fiber configurations benefit from specific acquisition protocols that impose a high number of gradient directions (b-vecs), a high maximum b-value (b-vals) and multiple b-values (multi-shell). However, acquisition time is limited in a clinical setting and commercial scanners may not provide robust state-of-the-art dMRI sequences. Therefore, dMRI is often acquired as single-shell (SS) (single b-value). Here, we learn improved FODs for commercially acquired dMRI. We evaluate the use of 3D convolutional neural networks (CNNs) to regress multi-shell FOS representations from single-shell representations, using the spherical harmonics basis obtained from constrained spherical deconvolution (CSD) to model FODs. We use U-Net and HighResNet 3D CNN architectures and data from the publicly available Human Connectome Dataset and a dataset acquired at National Hospital For Neurology and Neurosurgery Queen Square. We evaluate how well the CNN models can resolve local fiber orientation 1) when training and testing on datasets with same dMRI acquisition protocol; 2) when testing on dataset with a different dMRI acquisition protocol than used training the CNN models; and 3) when testing on datasets with a fewer number dMRI gradient directions than used training the CNN models. Our approach may enable robust CSD model estimation on dMRI acquisition protocols which are single shell and with a few gradient directions, reducing acquisition times, and thus, facilitating translation to time-limited clinical environments.
During fetoscopic laser photocoagulation, a treatment for twin-to-twin transfusion syndrome (TTTS), the clinician first identifies abnormal placental vascular connections and laser ablates them to regulate blood flow in both fetuses. The procedure is challenging due to the mobility of the environment, poor visibility in amniotic fluid, occasional bleeding, and limitations in the fetoscopic field-of-view and image quality. Ideally, anastomotic placental vessels would be automatically identified, segmented and registered to create expanded vessel maps to guide laser ablation, however, such methods have yet to be clinically adopted. We propose a solution utilising the U-Net architecture for performing placental vessel segmentation in fetoscopic videos. The obtained vessel probability maps provide sufficient cues for mosaicking alignment by registering consecutive vessel maps using the direct intensity-based technique. Experiments on 6 different in vivo fetoscopic videos demonstrate that the vessel intensity-based registration outperformed image intensity-based registration approaches showing better robustness in qualitative and quantitative comparison. We additionally reduce drift accumulation to negligible even for sequences with up to 400 frames and we incorporate a scheme for quantifying drift error in the absence of the ground-truth. Our paper provides a benchmark for fetoscopy placental vessel segmentation and registration by contributing the first in vivo vessel segmentation and fetoscopic videos dataset.
Although deep convolutional networks have reached state-of-the-art performance in many medical image segmentation tasks, they have typically demonstrated poor generalisation capability. To be able to generalise from one domain (e.g. one imaging modality) to another, domain adaptation has to be performed. While supervised methods may lead to good performance, they require to fully annotate additional data which may not be an option in practice. In contrast, unsupervised methods don't need additional annotations but are usually unstable and hard to train. In this work, we propose a novel weakly-supervised method. Instead of requiring detailed but time-consuming annotations, scribbles on the target domain are used to perform domain adaptation. This paper introduces a new formulation of domain adaptation based on structured learning and co-segmentation. Our method is easy to train, thanks to the introduction of a regularised loss. The framework is validated on Vestibular Schwannoma segmentation (T1 to T2 scans). Our proposed method outperforms unsupervised approaches and achieves comparable performance to a fully-supervised approach.