Is critical input information encoded in specific sparse pathways within the neural network? In this work, we discuss the problem of identifying these critical pathways and subsequently leverage them for interpreting the network's response to an input. The pruning objective -- selecting the smallest group of neurons for which the response remains equivalent to the original network -- has been previously proposed for identifying critical pathways. We demonstrate that sparse pathways derived from pruning do not necessarily encode critical input information. To ensure sparse pathways include critical fragments of the encoded input information, we propose pathway selection via neurons' contribution to the response. We proceed to explain how critical pathways can reveal critical input features. We prove that pathways selected via neuron contribution are locally linear (in an L2-ball), a property that we use for proposing a feature attribution method: "pathway gradient". We validate our interpretation method using mainstream evaluation experiments. The validation of pathway gradient interpretation method further confirms that selected pathways using neuron contributions correspond to critical input features. The code is publicly available.
Scene graphs are a compact and explicit representation successfully used in a variety of 2D scene understanding tasks. This work proposes a method to incrementally build up semantic scene graphs from a 3D environment given a sequence of RGB-D frames. To this end, we aggregate PointNet features from primitive scene components by means of a graph neural network. We also propose a novel attention mechanism well suited for partial and missing graph data present in such an incremental reconstruction scenario. Although our proposed method is designed to run on submaps of the scene, we show it also transfers to entire 3D scenes. Experiments show that our approach outperforms 3D scene graph prediction methods by a large margin and its accuracy is on par with other 3D semantic and panoptic segmentation methods while running at 35 Hz.
Interpretability in Graph Convolutional Networks (GCNs) has been explored to some extent in computer vision in general, yet, in the medical domain, it requires further examination. Moreover, most of the interpretability approaches for GCNs, especially in the medical domain, focus on interpreting the model in a post hoc fashion. In this paper, we propose an interpretable graph learning-based model which 1) interprets the clinical relevance of the input features towards the task, 2) uses the explanation to improve the model performance and, 3) learns a population level latent graph that may be used to interpret the cohort's behavior. In a clinical scenario, such a model can assist the clinical experts in better decision-making for diagnosis and treatment planning. The main novelty lies in the interpretable attention module (IAM), which directly operates on multi-modal features. Our IAM learns the attention for each feature based on the unique interpretability-specific losses. We show the application on two publicly available datasets, Tadpole and UKBB, for three tasks of disease, age, and gender prediction. Our proposed model shows superior performance with respect to compared methods with an increase in an average accuracy of 3.2% for Tadpole, 1.6% for UKBB Gender, and 2% for the UKBB Age prediction task. Further, we show exhaustive validation and clinical interpretation of our results.
Symptomatic spinal vertebral compression fractures (VCFs) often require osteoplasty treatment. A cement-like material is injected into the bone to stabilize the fracture, restore the vertebral body height and alleviate pain. Leakage is a common complication and may occur due to too much cement being injected. In this work, we propose an automated patient-specific framework that can allow physicians to calculate an upper bound of cement for the injection and estimate the optimal outcome of osteoplasty. The framework uses the patient CT scan and the fractured vertebra label to build a virtual healthy spine using a high-level approach. Firstly, the fractured spine is segmented with a three-step Convolution Neural Network (CNN) architecture. Next, a per-vertebra rigid registration to a healthy spine atlas restores its curvature. Finally, a GAN-based inpainting approach replaces the fractured vertebra with an estimation of its original shape. Based on this outcome, we then estimate the maximum amount of bone cement for injection. We evaluate our framework by comparing the virtual vertebrae volumes of ten patients to their healthy equivalent and report an average error of 3.88$\pm$7.63\%. The presented pipeline offers a first approach to a personalized automatic high-level framework for planning osteoplasty procedures.
Disentangled representations can be useful in many downstream tasks, help to make deep learning models more interpretable, and allow for control over features of synthetically generated images that can be useful in training other models that require a large number of labelled or unlabelled data. Recently, flow-based generative models have been proposed to generate realistic images by directly modeling the data distribution with invertible functions. In this work, we propose a new flow-based generative model framework, named GLOWin, that is end-to-end invertible and able to learn disentangled representations. Feature disentanglement is achieved by factorizing the latent space into components such that each component learns the representation for one generative factor. Comprehensive experiments have been conducted to evaluate the proposed method on a public brain tumor MR dataset. Quantitative and qualitative results suggest that the proposed method is effective in disentangling the features from complex medical images.
Hereditary hemolytic anemias are genetic disorders that affect the shape and density of red blood cells. Genetic tests currently used to diagnose such anemias are expensive and unavailable in the majority of clinical labs. Here, we propose a method for identifying hereditary hemolytic anemias based on a standard biochemistry method, called Percoll gradient, obtained by centrifuging a patient's blood. Our hybrid approach consists on using spatial data-driven features, extracted with a convolutional neural network and spectral handcrafted features obtained from fast Fourier transform. We compare late and early feature fusion with AlexNet and VGG16 architectures. AlexNet with late fusion of spectral features performs better compared to other approaches. We achieved an average F1-score of 88% on different classes suggesting the possibility of diagnosing of hereditary hemolytic anemias from Percoll gradients. Finally, we utilize Grad-CAM to explore the spatial features used for classification.
Surgical instrument segmentation for robot-assisted surgery is needed for accurate instrument tracking and augmented reality overlays. Therefore, the topic has been the subject of a number of recent papers in the CAI community. Deep learning-based methods have shown state-of-the-art performance for surgical instrument segmentation, but their results depend on labelled data. However, labelled surgical data is of limited availability and is a bottleneck in surgical translation of these methods. In this paper, we demonstrate the limited generalizability of these methods on different datasets, including human robot-assisted surgeries. We then propose a novel joint generation and segmentation strategy to learn a segmentation model with better generalization capability to domains that have no labelled data. The method leverages the availability of labelled data in a different domain. The generator does the domain translation from the labelled domain to the unlabelled domain and simultaneously, the segmentation model learns using the generated data while regularizing the generative model. We compared our method with state-of-the-art methods and showed its generalizability on publicly available datasets and on our own recorded video frames from robot-assisted prostatectomies. Our method shows consistently high mean Dice scores on both labelled and unlabelled domains when data is available only for one of the domains. *M. Kalia and T. Aleef contributed equally to the manuscript
Chest computed tomography (CT) has played an essential diagnostic role in assessing patients with COVID-19 by showing disease-specific image features such as ground-glass opacity and consolidation. Image segmentation methods have proven to help quantify the disease burden and even help predict the outcome. The availability of longitudinal CT series may also result in an efficient and effective method to reliably assess the progression of COVID-19, monitor the healing process and the response to different therapeutic strategies. In this paper, we propose a new framework to identify infection at a voxel level (identification of healthy lung, consolidation, and ground-glass opacity) and visualize the progression of COVID-19 using sequential low-dose non-contrast CT scans. In particular, we devise a longitudinal segmentation network that utilizes the reference scan information to improve the performance of disease identification. Experimental results on a clinical longitudinal dataset collected in our institution show the effectiveness of the proposed method compared to the static deep neural networks for disease quantification.
Skin cancer is one of the most deadly cancers worldwide. Yet, it can be reduced by early detection. Recent deep-learning methods have shown a dermatologist-level performance in skin cancer classification. Yet, this success demands a large amount of centralized data, which is oftentimes not available. Federated learning has been recently introduced to train machine learning models in a privacy-preserved distributed fashion demanding annotated data at the clients, which is usually expensive and not available, especially in the medical field. To this end, we propose FedPerl, a semi-supervised federated learning method that utilizes peer learning from social sciences and ensemble averaging from committee machines to build communities and encourage its members to learn from each other such that they produce more accurate pseudo labels. We also propose the peer anonymization (PA) technique as a core component of FedPerl. PA preserves privacy and reduces the communication cost while maintaining the performance without additional complexity. We validated our method on 38,000 skin lesion images collected from 4 publicly available datasets. FedPerl achieves superior performance over the baselines and state-of-the-art SSFL by 15.8%, and 1.8% respectively. Further, FedPerl shows less sensitivity to noisy clients.