Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alan L. Yuille

Training Multi-organ Segmentation Networks with Sample Selection by Relaxed Upper Confident Bound

Apr 07, 2018
Yan Wang, Yuyin Zhou, Peng Tang, Wei Shen, Elliot K. Fishman, Alan L. Yuille

Figure 1 for Training Multi-organ Segmentation Networks with Sample Selection by Relaxed Upper Confident Bound

Figure 2 for Training Multi-organ Segmentation Networks with Sample Selection by Relaxed Upper Confident Bound

Deep convolutional neural networks (CNNs), especially fully convolutional networks, have been widely applied to automatic medical image segmentation problems, e.g., multi-organ segmentation. Existing CNN-based segmentation methods mainly focus on looking for increasingly powerful network architectures, but pay less attention to data sampling strategies for training networks more effectively. In this paper, we present a simple but effective sample selection method for training multi-organ segmentation networks. Sample selection exhibits an exploitation-exploration strategy, i.e., exploiting hard samples and exploring less frequently visited samples. Based on the fact that very hard samples might have annotation errors, we propose a new sample selection policy, named Relaxed Upper Confident Bound (RUCB). Compared with other sample selection policies, e.g., Upper Confident Bound (UCB), it exploits a range of hard samples rather than being stuck with a small set of very hard ones, which mitigates the influence of annotation errors during training. We apply this new sample selection policy to training a multi-organ segmentation network on a dataset containing 120 abdominal CT scans and show that it boosts segmentation performance significantly.

* Submitted to MICCAI 2018

Via

Access Paper or Ask Questions

Multi-Scale Spatially-Asymmetric Recalibration for Image Classification

Apr 03, 2018
Yan Wang, Lingxi Xie, Siyuan Qiao, Ya Zhang, Wenjun Zhang, Alan L. Yuille

Figure 1 for Multi-Scale Spatially-Asymmetric Recalibration for Image Classification

Figure 2 for Multi-Scale Spatially-Asymmetric Recalibration for Image Classification

Figure 3 for Multi-Scale Spatially-Asymmetric Recalibration for Image Classification

Figure 4 for Multi-Scale Spatially-Asymmetric Recalibration for Image Classification

Convolution is spatially-symmetric, i.e., the visual features are independent of its position in the image, which limits its ability to utilize contextual cues for visual recognition. This paper addresses this issue by introducing a recalibration process, which refers to the surrounding region of each neuron, computes an importance value and multiplies it to the original neural response. Our approach is named multi-scale spatially-asymmetric recalibration (MS-SAR), which extracts visual cues from surrounding regions at multiple scales, and designs a weighting scheme which is asymmetric in the spatial domain. MS-SAR is implemented in an efficient way, so that only small fractions of extra parameters and computations are required. We apply MS-SAR to several popular building blocks, including the residual block and the densely-connected block, and demonstrate its superior performance in both CIFAR and ILSVRC2012 classification tasks.

* 17 pages, 5 figures, submitted to ECCV 2018

Via

Access Paper or Ask Questions

DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion

Mar 29, 2018
Zhishuai Zhang, Cihang Xie, Jianyu Wang, Lingxi Xie, Alan L. Yuille

Figure 1 for DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion

Figure 2 for DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion

Figure 3 for DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion

Figure 4 for DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion

In this paper, we study the task of detecting semantic parts of an object, e.g., a wheel of a car, under partial occlusion. We propose that all models should be trained without seeing occlusions while being able to transfer the learned knowledge to deal with occlusions. This setting alleviates the difficulty in collecting an exponentially large dataset to cover occlusion patterns and is more essential. In this scenario, the proposal-based deep networks, like RCNN-series, often produce unsatisfactory results, because both the proposal extraction and classification stages may be confused by the irrelevant occluders. To address this, [25] proposed a voting mechanism that combines multiple local visual cues to detect semantic parts. The semantic parts can still be detected even though some visual cues are missing due to occlusions. However, this method is manually-designed, thus is hard to be optimized in an end-to-end manner. In this paper, we present DeepVoting, which incorporates the robustness shown by [25] into a deep network, so that the whole pipeline can be jointly optimized. Specifically, it adds two layers after the intermediate features of a deep network, e.g., the pool-4 layer of VGGNet. The first layer extracts the evidence of local visual cues, and the second layer performs a voting mechanism by utilizing the spatial relationship between visual cues and semantic parts. We also propose an improved version DeepVoting+ by learning visual cues from context outside objects. In experiments, DeepVoting achieves significantly better performance than several baseline methods, including Faster-RCNN, for semantic part detection under occlusion. In addition, DeepVoting enjoys explainability as the detection results can be diagnosed via looking up the voting cues.

Via

Access Paper or Ask Questions

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Mar 13, 2018
Yuan Gao, Qi She, Jiayi Ma, Mingbo Zhao, Wei Liu, Tong Zhang, Alan L. Yuille

Figure 1 for NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Figure 2 for NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Figure 3 for NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Figure 4 for NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

State-of-the-art Convolutional Neural Network (CNN) benefits much from multi-task learning (MTL), which learns multiple related tasks simultaneously to obtain shared or mutually related representations for different tasks. The most widely used MTL CNN structure is based on an empirical or heuristic split on a specific layer (e.g., the last convolutional layer) to minimize multiple task-specific losses. However, this heuristic sharing/splitting strategy may be harmful to the final performance of one or multiple tasks. In this paper, we propose a novel CNN structure for MTL, which enables automatic feature fusing at every layer. Specifically, we first concatenate features from different tasks according to their channel dimension, and then formulate the feature fusing problem as discriminative dimensionality reduction. We show that this discriminative dimensionality reduction can be fulfilled by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN, which we refer to as Neural Discriminative Dimensionality Reduction (NDDR). We perform detailed ablation analysis for different configurations in training the proposed NDDR-CNN network. The experiments carried out on different network structures and different task sets demonstrate the promising performance and desirable generalizability of our proposed method.

* 17 pages, 5 figures, 7 tables

Via

Access Paper or Ask Questions

Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint Detections

Aug 20, 2017
Ehsan Jahangiri, Alan L. Yuille

Figure 1 for Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint Detections

Figure 2 for Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint Detections

Figure 3 for Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint Detections

Figure 4 for Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint Detections

We propose a method to generate multiple diverse and valid human pose hypotheses in 3D all consistent with the 2D detection of joints in a monocular RGB image. We use a novel generative model uniform (unbiased) in the space of anatomically plausible 3D poses. Our model is compositional (produces a pose by combining parts) and since it is restricted only by anatomical constraints it can generalize to every plausible human 3D pose. Removing the model bias intrinsically helps to generate more diverse 3D pose hypotheses. We argue that generating multiple pose hypotheses is more reasonable than generating only a single 3D pose based on the 2D joint detection given the depth ambiguity and the uncertainty due to occlusion and imperfect 2D joint detection. We hope that the idea of generating multiple consistent pose hypotheses can give rise to a new line of future work that has not received much attention in the literature. We used the Human3.6M dataset for empirical evaluation.

* accepted to ICCV 2017 (PeopleCap)

Via

Access Paper or Ask Questions

NormFace: L2 Hypersphere Embedding for Face Verification

Jul 26, 2017
Feng Wang, Xiang Xiang, Jian Cheng, Alan L. Yuille

Figure 1 for NormFace: L2 Hypersphere Embedding for Face Verification

Figure 2 for NormFace: L2 Hypersphere Embedding for Face Verification

Figure 3 for NormFace: L2 Hypersphere Embedding for Face Verification

Figure 4 for NormFace: L2 Hypersphere Embedding for Face Verification

Thanks to the recent developments of Convolutional Neural Networks, the performance of face verification methods has increased rapidly. In a typical face verification method, feature normalization is a critical step for boosting performance. This motivates us to introduce and study the effect of normalization during training. But we find this is non-trivial, despite normalization being differentiable. We identify and study four issues related to normalization through mathematical analysis, which yields understanding and helps with parameter settings. Based on this analysis we propose two strategies for training using normalized features. The first is a modification of softmax loss, which optimizes cosine similarity instead of inner-product. The second is a reformulation of metric learning by introducing an agent vector for each class. We show that both strategies, and small variants, consistently improve performance by between 0.2% to 0.4% on the LFW dataset based on two models. This is significant because the performance of the two models on LFW dataset is close to saturation at over 98%. Codes and models are released on https://github.com/happynear/NormFace

* camera-ready version

Via

Access Paper or Ask Questions

Deep Supervision for Pancreatic Cyst Segmentation in Abdominal CT Scans

Jun 22, 2017
Yuyin Zhou, Lingxi Xie, Elliot K. Fishman, Alan L. Yuille

Figure 1 for Deep Supervision for Pancreatic Cyst Segmentation in Abdominal CT Scans

Figure 2 for Deep Supervision for Pancreatic Cyst Segmentation in Abdominal CT Scans

Figure 3 for Deep Supervision for Pancreatic Cyst Segmentation in Abdominal CT Scans

Figure 4 for Deep Supervision for Pancreatic Cyst Segmentation in Abdominal CT Scans

Automatic segmentation of an organ and its cystic region is a prerequisite of computer-aided diagnosis. In this paper, we focus on pancreatic cyst segmentation in abdominal CT scan. This task is important and very useful in clinical practice yet challenging due to the low contrast in boundary, the variability in location, shape and the different stages of the pancreatic cancer. Inspired by the high relevance between the location of a pancreas and its cystic region, we introduce extra deep supervision into the segmentation network, so that cyst segmentation can be improved with the help of relatively easier pancreas segmentation. Under a reasonable transformation function, our approach can be factorized into two stages, and each stage can be efficiently optimized via gradient back-propagation throughout the deep networks. We collect a new dataset with 131 pathological samples, which, to the best of our knowledge, is the largest set for pancreatic cyst segmentation. Without human assistance, our approach reports a 63.44% average accuracy, measured by the Dice-S{\o}rensen coefficient (DSC), which is higher than the number (60.46%) without deep supervision.

* Accepted to MICCAI 2017 (8 pages, 3 figures)

Via

Access Paper or Ask Questions

A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans

Jun 21, 2017
Yuyin Zhou, Lingxi Xie, Wei Shen, Yan Wang, Elliot K. Fishman, Alan L. Yuille

Figure 1 for A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans

Figure 2 for A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans

Figure 3 for A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans

Figure 4 for A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans

Deep neural networks have been widely adopted for automatic organ segmentation from abdominal CT scans. However, the segmentation accuracy of some small organs (e.g., the pancreas) is sometimes below satisfaction, arguably because deep networks are easily disrupted by the complex and variable background regions which occupies a large fraction of the input volume. In this paper, we formulate this problem into a fixed-point model which uses a predicted segmentation mask to shrink the input region. This is motivated by the fact that a smaller input region often leads to more accurate segmentation. In the training process, we use the ground-truth annotation to generate accurate input regions and optimize network weights. On the testing stage, we fix the network parameters and update the segmentation results in an iterative manner. We evaluate our approach on the NIH pancreas segmentation dataset, and outperform the state-of-the-art by more than 4%, measured by the average Dice-S{\o}rensen Coefficient (DSC). In addition, we report 62.43% DSC in the worst case, which guarantees the reliability of our approach in clinical applications.

* Accepted to MICCAI 2017 (8 pages, 3 figures)

Via

Access Paper or Ask Questions

Regularizing Face Verification Nets For Pain Intensity Regression

Jun 01, 2017
Feng Wang, Xiang Xiang, Chang Liu, Trac D. Tran, Austin Reiter, Gregory D. Hager, Harry Quon, Jian Cheng, Alan L. Yuille

Figure 1 for Regularizing Face Verification Nets For Pain Intensity Regression

Figure 2 for Regularizing Face Verification Nets For Pain Intensity Regression

Figure 3 for Regularizing Face Verification Nets For Pain Intensity Regression

Figure 4 for Regularizing Face Verification Nets For Pain Intensity Regression

Limited labeled data are available for the research of estimating facial expression intensities. For instance, the ability to train deep networks for automated pain assessment is limited by small datasets with labels of patient-reported pain intensities. Fortunately, fine-tuning from a data-extensive pre-trained domain, such as face verification, can alleviate this problem. In this paper, we propose a network that fine-tunes a state-of-the-art face verification network using a regularized regression loss and additional data with expression labels. In this way, the expression intensity regression task can benefit from the rich feature representations trained on a huge amount of data for face verification. The proposed regularized deep regressor is applied to estimate the pain expression intensity and verified on the widely-used UNBC-McMaster Shoulder-Pain dataset, achieving the state-of-the-art performance. A weighted evaluation metric is also proposed to address the imbalance issue of different pain intensities.

* 5 pages, 3 figure; Camera-ready version to appear at IEEE ICIP 2017

Via

Access Paper or Ask Questions

Object Recognition with and without Objects

May 25, 2017
Zhuotun Zhu, Lingxi Xie, Alan L. Yuille

Figure 1 for Object Recognition with and without Objects

Figure 2 for Object Recognition with and without Objects

Figure 3 for Object Recognition with and without Objects

Figure 4 for Object Recognition with and without Objects

While recent deep neural networks have achieved a promising performance on object recognition, they rely implicitly on the visual contents of the whole image. In this paper, we train deep neural net- works on the foreground (object) and background (context) regions of images respectively. Consider- ing human recognition in the same situations, net- works trained on the pure background without ob- jects achieves highly reasonable recognition performance that beats humans by a large margin if only given context. However, humans still outperform networks with pure object available, which indicates networks and human beings have different mechanisms in understanding an image. Furthermore, we straightforwardly combine multiple trained networks to explore different visual cues learned by different networks. Experiments show that useful visual hints can be explicitly learned separately and then combined to achieve higher performance, which verifies the advantages of the proposed framework.

* To Appear in IJCAI 2017

Via

Access Paper or Ask Questions