The data-driven nature of deep learning models for semantic segmentation requires a large number of pixel-level annotations. However, large-scale and fully labeled medical datasets are often unavailable for practical tasks. Recently, partially supervised methods have been proposed to utilize images with incomplete labels to mitigate the data scarcity problem in the medical domain. As an emerging research area, the breakthroughs made by existing methods rely on either large-scale data or complex model design, which makes them 1) less practical for certain real-life tasks and 2) less robust for small-scale data. It is time to step back and think about the robustness of partially supervised methods and how to maximally utilize small-scale and partially labeled data for medical image segmentation tasks. To bridge the methodological gaps in label-efficient deep learning with partial supervision, we propose RAMP, a simple yet efficient data augmentation framework for partially supervised medical image segmentation by exploiting the assumption that patients share anatomical similarities. We systematically evaluate RAMP and the previous methods in various controlled multi-structure segmentation tasks. Compared to the mainstream approaches, RAMP consistently improves the performance of traditional segmentation networks on small-scale partially labeled data and utilize additional image-wise weak annotations.
Classic algebraic reconstruction technology (ART) for computed tomography requires pre-determined weights of the voxels for projecting pixel values. However, such weight cannot be accurately obtained due to the limitation of the physical understanding and computation resources. In this study, we propose a semi-case-wise learning-based method named Weight Encode Reconstruction Network (WERNet) to tackle the issues mentioned above. The model is trained in a self-supervised manner without the label of a voxel set. It contains two branches, including the voxel weight encoder and the voxel attention part. Using gradient normalization, we are able to co-train the encoder and voxel set numerically stably. With WERNet, the reconstructed result was obtained with a cosine similarity greater than 0.999 with the ground truth. Moreover, the model shows the extraordinary capability of denoising comparing to the classic ART method. In the generalization test of the model, the encoder is transferable from a voxel set with complex structure to the unseen cases without the deduction of the accuracy.
In this technical report, we analyze Legendre decomposition for non-negative tensor in theory and application. In theory, the properties of dual parameters and dually flat manifold in Legendre decomposition are reviewed, and the process of tensor projection and parameter updating is analyzed. In application, a series of verification experiments and clustering experiments with parameters in submanifolds are carried out, hoping to find an effective lower dimensional representation of the input tensor. The experimental results show that the parameters in submanifolds have no ability to be directly represented as low-rank representations. Combined with analysis, we connect Legendre decomposition with neural networks and low-rank representation, and put forward some promising prospects.
Motivation: Cryo-Electron Tomography (cryo-ET) visualizes structure and spatial organization of macromolecules and their interactions with other subcellular components inside single cells in the close-to-native state at sub-molecular resolution. Such information is critical for the accurate understanding of cellular processes. However, subtomogram classification remains one of the major challenges for the systematic recognition and recovery of the macromolecule structures in cryo-ET because of imaging limits and data quantity. Recently, deep learning has significantly improved the throughput and accuracy of large-scale subtomogram classification. However often it is difficult to get enough high-quality annotated subtomogram data for supervised training due to the enormous expense of labeling. To tackle this problem, it is beneficial to utilize another already annotated dataset to assist the training process. However, due to the discrepancy of image intensity distribution between source domain and target domain, the model trained on subtomograms in source domainmay perform poorly in predicting subtomogram classes in the target domain. Results: In this paper, we adapt a few shot domain adaptation method for deep learning based cross-domain subtomogram classification. The essential idea of our method consists of two parts: 1) take full advantage of the distribution of plentiful unlabeled target domain data, and 2) exploit the correlation between the whole source domain dataset and few labeled target domain data. Experiments conducted on simulated and real datasets show that our method achieves significant improvement on cross domain subtomogram classification compared with baseline methods.
Diabetic Retinopathy (DR) is a leading cause of blindness in working age adults. DR lesions can be challenging to identify in fundus images, and automatic DR detection systems can offer strong clinical value. Of the publicly available labeled datasets for DR, the Indian Diabetic Retinopathy Image Dataset (IDRiD) presents retinal fundus images with pixel-level annotations of four distinct lesions: microaneurysms, hemorrhages, soft exudates and hard exudates. We utilize the HEDNet edge detector to solve a semantic segmentation task on this dataset, and then propose an end-to-end system for pixel-level segmentation of DR lesions by incorporating HEDNet into a Conditional Generative Adversarial Network (cGAN). We design a loss function that adds adversarial loss to segmentation loss. Our experiments show that the addition of the adversarial loss improves the lesion segmentation performance over the baseline.
Although deep neural networks (DNNs) have achieved fantastic success in various scenarios, it's difficult to employ DNNs on many systems with limited resources due to their high energy consumption. It's well known that spiking neural networks (SNNs) are attracting more attention due to the capability of energy-efficient computing. Recently many works focus on converting DNNs into SNNs with little accuracy degradation in image classification on MNIST, CIFAR-10/100. However, few studies on shortening latency, and spike-based modules of more challenging tasks on complex datasets. In this paper, we focus on the similarity matching method of deep spike features and present a first spike-based Siamese network for object tracking called SiamSNN. Specifically, we propose a hybrid spiking similarity matching method with membrane potential and time step to evaluate the response map between exemplar and candidate images, with the same function as correlation layer in SiamFC. Then we present a coding scheme for utilizing temporal information of spike trains, and implement it in output spiking layers to improve the performance and shorten the latency. Our experiments show that SiamSNN achieves short latency and low precision loss of the original SiamFC on the tracking datasets OTB-2013, OTB-2015 and VOT2016. Moreover, SiamSNN achieves real-time (50 FPS) and extremely low energy consumption on TrueNorth.
3D shape reconstruction from a single-view RGB image is an ill-posed problem due to the invisible parts of the object to be reconstructed. Most of the existing methods rely on large-scale data to obtain shape priors through tuning parameters of reconstruction models. These methods might not be able to deal with the cases with heavy object occlusions and noisy background since prior information can not be retained completely or applied efficiently. In this paper, we are the first to develop a memory-based meta-learning framework for single-view 3D reconstruction. A write controller is designed to extract shape-discriminative features from images and store image features and their corresponding volumes into external memory. A read controller is proposed to sequentially encode shape priors related to the input image and predict a shape-specific refiner. Experimental results demonstrate that our Meta3D outperforms state-of-the-art methods with a large margin through retaining shape priors explicitly, especially for the extremely difficult cases.
Cryo-electron tomography (cryo-ET) is an emerging technology for the 3D visualization of structural organizations and interactions of subcellular components at near-native state and sub-molecular resolution. Tomograms captured by cryo-ET contain heterogeneous structures representing the complex and dynamic subcellular environment. Since the structures are not purified or fluorescently labeled, the spatial organization and interaction between both the known and unknown structures can be studied in their native environment. The rapid advances of cryo-electron tomography (cryo-ET) have generated abundant 3D cellular imaging data. However, the systematic localization, identification, segmentation, and structural recovery of the subcellular components require efficient and accurate large-scale image analysis methods. We introduce AITom, an open-source artificial intelligence platform for cryo-ET researchers. AITom provides many public as well as in-house algorithms for performing cryo-ET data analysis through both the traditional template-based or template-free approach and the deep learning approach. Comprehensive tutorials for each analysis module are provided to guide the user through. We welcome researchers and developers to join this collaborative open-source software development project. Availability: https://github.com/xulabs/aitom
We developed a deep learning model-based system to automatically generate a quantitative Computed Tomography (CT) diagnostic report for Pulmonary Tuberculosis (PTB) cases.501 CT imaging datasets from 223 patients with active PTB were collected, and another 501 cases from a healthy population served as negative samples.2884 lesions of PTB were carefully labeled and classified manually by professional radiologists.Three state-of-the-art 3D convolution neural network (CNN) models were trained and evaluated in the inspection of PTB CT images. Transfer learning method was also utilized during this process. The best model was selected to annotate the spatial location of lesions and classify them into miliary, infiltrative, caseous, tuberculoma and cavitary types simultaneously.Then the Noisy-Or Bayesian function was used to generate an overall infection probability.Finally, a quantitative diagnostic report was exported.The results showed that the recall and precision rates, from the perspective of a single lesion region of PTB, were 85.9% and 89.2% respectively. The overall recall and precision rates,from the perspective of one PTB case, were 98.7% and 93.7%, respectively. Moreover, the precision rate of the PTB lesion type classification was 90.9%.The new method might serve as an effective reference for decision making by clinical doctors.