Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis and also very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition). The code can be found on our project website.
The goal of few-shot learning is to recognize new visual concepts with just a few amount of labeled samples in each class. Recent effective metric-based few-shot approaches employ neural networks to learn a feature similarity comparison between query and support examples. However, the importance of feature embedding, i.e., exploring the relationship among training samples, is neglected. In this work, we present a simple yet powerful baseline for few-shot classification by emphasizing the importance of feature embedding. Specifically, we revisit the classical triplet network from deep metric learning, and extend it into a deep K-tuplet network for few-shot learning, utilizing the relationship among the input samples to learn a general representation learning via episode-training. Once trained, our network is able to extract discriminative features for unseen novel categories and can be seamlessly incorporated with a non-linear distance metric function to facilitate the few-shot classification. Our result on the miniImageNet benchmark outperforms other metric-based few-shot classification methods. More importantly, when evaluated on completely different datasets (Caltech-101, CUB-200, Stanford Dogs and Cars) using the model trained with miniImageNet, our method significantly outperforms prior methods, demonstrating its superior capability to generalize to unseen classes.
Brain image registration transforms a pair of images into one system with the matched imaging contents, which is of essential importance for brain image analysis. This paper presents a novel framework for unsupervised 3D brain image registration by capturing the feature-level transformation relationships between the unaligned image and reference image. To achieve this, we develop a feature-level probabilistic model to provide the direct regularization to the hidden layers of two deep convolutional neural networks, which are constructed from two input images. This model design is developed into multiple layers of these two networks to capture the transformation relationships at different levels. We employ two common benchmark datasets for 3D brain image registration and perform various experiments to evaluate our method. Experimental results show that our method clearly outperforms state-of-the-art methods on both benchmark datasets by a large margin.
Automatic prostate segmentation in transrectal ultrasound (TRUS) images is of essential importance for image-guided prostate interventions and treatment planning. However, developing such automatic solutions remains very challenging due to the missing/ambiguous boundary and inhomogeneous intensity distribution of the prostate in TRUS, as well as the large variability in prostate shapes. This paper develops a novel 3D deep neural network equipped with attention modules for better prostate segmentation in TRUS by fully exploiting the complementary information encoded in different layers of the convolutional neural network (CNN). Our attention module utilizes the attention mechanism to selectively leverage the multilevel features integrated from different layers to refine the features at each individual layer, suppressing the non-prostate noise at shallow layers of the CNN and increasing more prostate details into features at deep layers. Experimental results on challenging 3D TRUS volumes show that our method attains satisfactory segmentation performance. The proposed attention mechanism is a general strategy to aggregate multi-level deep features and has the potential to be used for other medical image segmentation tasks. The code is publicly available at https://github.com/wulalago/DAF3D.
Rare diseases have extremely low-data regimes, unlike common diseases with large amount of available labeled data. Hence, to train a neural network to classify rare diseases with a few per-class data samples is very challenging, and so far, catches very little attention. In this paper, we present a difficulty-aware meta-learning method to address rare disease classifications and demonstrate its capability to classify dermoscopy images. Our key approach is to first train and construct a meta-learning model from data of common diseases, then adapt the model to perform rare disease classification.To achieve this, we develop the difficulty-aware meta-learning method that dynamically monitors the importance of learning tasks during the meta-optimization stage. To evaluate our method, we use the recent ISIC 2018 skin lesion classification dataset, and show that with only five samples per class, our model can quickly adapt to classify unseen classes by a high AUC of 83.3%. Also, we evaluated several rare disease classification results in the public Dermofit Image Library to demonstrate the potential of our method for real clinical practice.
Accurate segmentation of the optic disc (OD) and cup (OC)in fundus images from different datasets is critical for glaucoma disease screening. The cross-domain discrepancy (domain shift) hinders the generalization of deep neural networks to work on different domain datasets.In this work, we present an unsupervised domain adaptation framework,called Boundary and Entropy-driven Adversarial Learning (BEAL), to improve the OD and OC segmentation performance, especially on the ambiguous boundary regions. In particular, our proposed BEAL frame-work utilizes the adversarial learning to encourage the boundary prediction and mask probability entropy map (uncertainty map) of the target domain to be similar to the source ones, generating more accurate boundaries and suppressing the high uncertainty predictions of OD and OC segmentation. We evaluate the proposed BEAL framework on two public retinal fundus image datasets (Drishti-GS and RIM-ONE-r3), and the experiment results demonstrate that our method outperforms the state-of-the-art unsupervised domain adaptation methods. Codes will be available at https://github.com/EmmaW8/BEAL.
Accurate and automatic analysis of breast MRI plays an important role in early diagnosis and successful treatment planning for breast cancer. Due to the heterogeneity nature, accurate diagnosis of tumors remains a challenging task. In this paper, we propose to identify breast tumor in MRI by Cosine Margin Sigmoid Loss (CMSL) with deep learning (DL) and localize possible cancer lesion by COrrelation Attention Map (COAM) based on the learned features. The CMSL embeds tumor features onto a hypersphere and imposes a decision margin through cosine constraints. In this way, the DL model could learn more separable inter-class features and more compact intra-class features in the angular space. Furthermore, we utilize the correlations among feature vectors to generate attention maps that could accurately localize cancer candidates with only image-level label. We build the largest breast cancer dataset involving 10,290 DCE-MRI scan volumes for developing and evaluating the proposed methods. The model driven by CMSL achieved classification accuracy of 0.855 and AUC of 0.902 on the testing set, with sensitivity and specificity of 0.857 and 0.852, respectively, outperforming other competitive methods overall. In addition, the proposed COAM accomplished more accurate localization of the cancer center compared with other state-of-the-art weakly supervised localization method.
This paper presents a novel approach to learn and detect distinctive regions on 3D shapes. Unlike previous works, which require labeled data, our method is unsupervised. We conduct the analysis on point sets sampled from 3D shapes and train a deep neural network for an unsupervised shape clustering task to learn local and global features for distinguishing shapes relative to a given shape set. To drive the network to learn in an unsupervised manner, we design a clustering-based nonparametric softmax classifier with an iterative re-clustering of shapes, and an adapted contrastive loss for enhancing the feature embedding quality and stabilizing the learning process. By then, we encourage the network to learn the point distinctiveness on the input shapes. We extensively evaluate various aspects of our approach and present its applications for distinctiveness-guided shape retrieval, sampling, and view selection in 3D scenes.
This paper presents a new method for shadow removal using unpaired data, enabling us to avoid tedious annotations and obtain more diverse training samples. However, directly employing adversarial learning and cycle-consistency constraints is insufficient to learn the underlying relationship between the shadow and shadow-free domains, since the mapping between shadow and shadow-free images is not simply one-to-one. To address the problem, we formulate Mask-ShadowGAN, a new deep framework that automatically learns to produce a shadow mask from the input shadow image and then takes the mask to guide the shadow generation via re-formulated cycle-consistency constraints. Particularly, the framework simultaneously learns to produce shadow masks and learns to remove shadows, to maximize the overall performance. Also, we prepared an unpaired dataset for shadow removal and demonstrated the effectiveness of Mask-ShadowGAN on various experiments, even it was trained on unpaired data.
This paper presents a new deep neural network design for salient object detection by maximizing the integration of local and global image context within, around, and beyond the salient objects. Our key idea is to adaptively propagate and aggregate the image context with variable attenuation over the entire feature maps. To achieve this, we design the spatial attenuation context (SAC) module to recurrently translate and aggregate the context features independently with different attenuation factors and then attentively learn the weights to adaptively integrate the aggregated context features. By further embedding the module to process individual layers in a deep network, namely SAC-Net, we can train the network end-to-end and optimize the context features for detecting salient objects. Compared with 22 state-of-the-art methods, experimental results show that our method performs favorably over all the others on six common benchmark data, both quantitatively and visually.