Image registration is an important task in medical imaging which estimates the spatial transformation between different images. Many previous studies have used learning-based methods for multi-stage registration to perform 3D image registration to improve performance. The performance of the multi-stage approach, however, is limited by the size of the receptive field where complex motion does not occur at a single spatial scale. We propose a new registration network combining recursive network architecture and mutual attention mechanism to overcome these limitations. Compared with the previous deep learning methods, our network based on the recursive structure achieves the highest accuracy in lung Computed Tomography (CT) data set (Dice score of 92\% and average surface distance of 3.8mm for lungs) and one of the most accurate results in abdominal CT data set with 9 organs of various sizes (Dice score of 55\% and average surface distance of 7.8mm). We also showed that adding 3 recursive networks is sufficient to achieve the state-of-the-art results without a significant increase in the inference time.
Knowledge graph (KG) plays an increasingly important role in recommender systems. Recently, graph neural networks (GNNs) based model has gradually become the theme of knowledge-aware recommendation (KGR). However, there is a natural deficiency for GNN-based KGR models, that is, the sparse supervised signal problem, which may make their actual performance drop to some extent. Inspired by the recent success of contrastive learning in mining supervised signals from data itself, in this paper, we focus on exploring the contrastive learning in KG-aware recommendation and propose a novel multi-level cross-view contrastive learning mechanism, named MCCLK. Different from traditional contrastive learning methods which generate two graph views by uniform data augmentation schemes such as corruption or dropping, we comprehensively consider three different graph views for KG-aware recommendation, including global-level structural view, local-level collaborative and semantic views. Specifically, we consider the user-item graph as a collaborative view, the item-entity graph as a semantic view, and the user-item-entity graph as a structural view. MCCLK hence performs contrastive learning across three views on both local and global levels, mining comprehensive graph feature and structure information in a self-supervised manner. Besides, in semantic view, a k-Nearest-Neighbor (kNN) item-item semantic graph construction module is proposed, to capture the important item-item semantic relation which is usually ignored by previous work. Extensive experiments conducted on three benchmark datasets show the superior performance of our proposed method over the state-of-the-arts. The implementations are available at: https://github.com/CCIIPLab/MCCLK.
Image registration is important for medical imaging, the estimation of the spatial transformation between different images. Many previous studies have used learning-based methods for coarse-to-fine registration to efficiently perform 3D image registration. The coarse-to-fine approach, however, is limited when dealing with the different motions of nearby objects. Here we propose a novel Motion-Aware (MA) structure that captures the different motions in a region. The MA structure incorporates a novel Residual Aligner (RA) module which predicts the multi-head displacement field used to disentangle the different motions of multiple neighbouring objects. Compared with other deep learning methods, the network based on the MA structure and RA module achieve one of the most accurate unsupervised inter-subject registration on the 9 organs of assorted sizes in abdominal CT scans, with the highest-ranked registration of the veins (Dice Similarity Coefficient / Average surface distance: 62\%/4.9mm for the vena cava and 34\%/7.9mm for the portal and splenic vein), with a half-sized structure and more efficient computation. Applied to the segmentation of lungs in chest CT scans, the new network achieves results which were indistinguishable from the best-ranked networks (94\%/3.0mm). Additionally, the theorem on predicted motion pattern and the design of MA structure are validated by further analysis.
Zero-shot learning (ZSL) aims to recognize unseen classes based on the knowledge of seen classes. Previous methods focused on learning direct embeddings from global features to the semantic space in hope of knowledge transfer from seen classes to unseen classes. However, an unseen class shares local visual features with a set of seen classes and leveraging global visual features makes the knowledge transfer ineffective. To tackle this problem, we propose a Region Semantically Aligned Network (RSAN), which maps local features of unseen classes to their semantic attributes. Instead of using global features which are obtained by an average pooling layer after an image encoder, we directly utilize the output of the image encoder which maintains local information of the image. Concretely, we obtain each attribute from a specific region of the output and exploit these attributes for recognition. As a result, the knowledge of seen classes can be successfully transferred to unseen classes in a region-bases manner. In addition, we regularize the image encoder through attribute regression with a semantic knowledge to extract robust and attribute-related visual features. Experiments on several standard ZSL datasets reveal the benefit of the proposed RSAN method, outperforming state-of-the-art methods.
Session-based recommendation (SBR) is a challenging task, which aims at recommending items based on anonymous behavior sequences. Almost all the existing solutions for SBR model user preference only based on the current session without exploiting the other sessions, which may contain both relevant and irrelevant item-transitions to the current session. This paper proposes a novel approach, called Global Context Enhanced Graph Neural Networks (GCE-GNN) to exploit item transitions over all sessions in a more subtle manner for better inferring the user preference of the current session. Specifically, GCE-GNN learns two levels of item embeddings from session graph and global graph, respectively: (i) Session graph, which is to learn the session-level item embedding by modeling pairwise item-transitions within the current session; and (ii) Global graph, which is to learn the global-level item embedding by modeling pairwise item-transitions over all sessions. In GCE-GNN, we propose a novel global-level item representation learning layer, which employs a session-aware attention mechanism to recursively incorporate the neighbors' embeddings of each node on the global graph. We also design a session-level item representation learning layer, which employs a GNN on the session graph to learn session-level item embeddings within the current session. Moreover, GCE-GNN aggregates the learnt item representations in the two levels with a soft attention mechanism. Experiments on three benchmark datasets demonstrate that GCE-GNN outperforms the state-of-the-art methods consistently.
Automatic medical image segmentation based on Computed Tomography (CT) has been widely applied for computer-aided surgery as a prerequisite. With the development of deep learning technologies, deep convolutional neural networks (DCNNs) have shown robust performance in automated semantic segmentation of medical images. However, semantic segmentation algorithms based on DCNNs still meet the challenges of feature loss between encoder and decoder, multi-scale object, restricted field of view of filters, and lack of medical image data. This paper proposes a novel algorithm for automated vertebrae segmentation via 3D volumetric spine CT images. The proposed model is based on the structure of encoder to decoder, using layer normalization to optimize mini-batch training performance. To address the concern of the information loss between encoder and decoder, we designed an Atrous Residual Path to pass more features from encoder to decoder instead of an easy shortcut connection. The proposed model also applied the attention module in the decoder part to extract features from variant scales. The proposed model is evaluated on a publicly available dataset by a variety of metrics. The experimental results show that our model achieves competitive performance compared with other state-of-the-art medical semantic segmentation methods.
COVID-19, a new strain of coronavirus disease, has been one of the most serious and infectious disease in the world. Chest CT is essential in prognostication, diagnosing this disease, and assessing the complication. In this paper, a multi-class COVID-19 CT segmentation is proposed aiming at helping radiologists estimate the extent of effected lung volume. We utilized four augmented pyramid networks on an encoder-decoder segmentation framework. Quadruple Augmented Pyramid Network (QAP-Net) not only enable CNN capture features from variation size of CT images, but also act as spatial interconnections and down-sampling to transfer sufficient feature information for semantic segmentation. Experimental results achieve competitive performance in segmentation with the Dice of 0.8163, which outperforms other state-of-the-art methods, demonstrating the proposed framework can segments of consolidation as well as glass, ground area via COVID-19 chest CT efficiently and accurately.
Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8X shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training, which achieves a BLEU score of 30.63 on English-German newstest2014. The code is publicly available at https://github.com/libeineu/GPKD.
Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why deep models help NMT is an open question. In this paper, we investigate the behavior of a well-tuned deep Transformer system. We find that stacking layers is helpful in improving the representation ability of NMT models and adjacent layers perform similarly. This inspires us to develop a shallow-to-deep training method that learns deep models by stacking shallow models. In this way, we successfully train a Transformer system with a 54-layer encoder. Experimental results on WMT'16 English-German and WMT'14 English-French translation tasks show that it is $1.4$ $\times$ faster than training from scratch, and achieves a BLEU score of $30.33$ and $43.29$ on two tasks. The code is publicly available at https://github.com/libeineu/SDT-Training/.
Segmentation algorithms of medical image volumes are widely studied for many clinical and research purposes. We propose a novel and efficient framework for medical image segmentation. The framework functions under a deep learning paradigm, incorporating four novel contributions. Firstly, a residual interconnection is explored in different scale encoders. Secondly, four copy and crop connections are replaced to residual-block-based concatenation to alleviate the disparity between encoders and decoders, respectively. Thirdly, convolutional attention modules for feature refinement are studied on all scale decoders. Finally, an adaptive clean noisy label learning strategy(ACNLL) based on the training process from underfitting to overfitting is studied. Experimental results are illustrated on a publicly available benchmark database of spine CTs. Our segmentation framework achieves competitive performance with other state-of-the-art methods over a variety of different evaluation measures.