COVID-19 has become a global pandemic and is still posing a severe health risk to the public. Accurate and efficient segmentation of pneumonia lesions in CT scans is vital for treatment decision-making. We proposed a novel unsupervised approach using cycle consistent generative adversarial network (cycle-GAN) which automates and accelerates the process of lesion delineation. The workflow includes lung volume segmentation, "synthetic" healthy lung generation, infected and healthy image subtraction, and binary lesion mask creation. The lung volume volume was firstly delineated using a pre-trained U-net and worked as the input for the later network. The cycle-GAN was developed to generate synthetic "healthy" lung CT images from infected lung images. After that, the pneumonia lesions are extracted by subtracting the synthetic "healthy" lung CT images from the "infected" lung CT images. A median filter and K-means clustering were then applied to contour the lesions. The auto segmentation approach was validated on two public datasets (Coronacases and Radiopedia). The Dice coefficients reached 0.748 and 0.730, respectively, for the Coronacases and Radiopedia datasets. Meanwhile, the precision and sensitivity for lesion segmentationdetection are 0.813 and 0.735 for the Coronacases dataset, and 0.773 and 0.726 for the Radiopedia dataset. The performance is comparable to existing supervised segmentation networks and outperforms previous unsupervised ones. The proposed unsupervised segmentation method achieved high accuracy and efficiency in automatic COVID-19 lesion delineation. The segmentation result can serve as a baseline for further manual modification and a quality assurance tool for lesion diagnosis. Furthermore, due to its unsupervised nature, the result is not influenced by physicians' experience which otherwise is crucial for supervised methods.
Low-dose CT has been a key diagnostic imaging modality to reduce the potential risk of radiation overdose to patient health. Despite recent advances, CNN-based approaches typically apply filters in a spatially invariant way and adopt similar pixel-level losses, which treat all regions of the CT image equally and can be inefficient when fine-grained structures coexist with non-uniformly distributed noises. To address this issue, we propose a Structure-preserving Kernel Prediction Network (StructKPN) that combines the kernel prediction network with a structure-aware loss function that utilizes the pixel gradient statistics and guides the model towards spatially-variant filters that enhance noise removal, prevent over-smoothing and preserve detailed structures for different regions in CT imaging. Extensive experiments demonstrated that our approach achieved superior performance on both synthetic and non-synthetic datasets, and better preserves structures that are highly desired in clinical screening and low-dose protocol optimization.
Food image segmentation is a critical and indispensible task for developing health-related applications such as estimating food calories and nutrients. Existing food image segmentation models are underperforming due to two reasons: (1) there is a lack of high quality food image datasets with fine-grained ingredient labels and pixel-wise location masks -- the existing datasets either carry coarse ingredient labels or are small in size; and (2) the complex appearance of food makes it difficult to localize and recognize ingredients in food images, e.g., the ingredients may overlap one another in the same image, and the identical ingredient may appear distinctly in different food images. In this work, we build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images. We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks. In addition, we propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge. In experiments, we use three popular semantic segmentation methods (i.e., Dilated Convolution based, Feature Pyramid based, and Vision Transformer based) as baselines, and evaluate them as well as ReLeM on our new datasets. We believe that the FoodSeg103 (and its extension FoodSeg154) and the pre-trained models using ReLeM can serve as a benchmark to facilitate future works on fine-grained food image understanding. We make all these datasets and methods public at \url{https://xiongweiwu.github.io/foodseg103.html}.
Existing pre-trained language models (PLMs) have demonstrated the effectiveness of self-supervised learning for a broad range of natural language processing (NLP) tasks. However, most of them are not explicitly aware of domain-specific knowledge, which is essential for downstream tasks in many domains, such as tasks in e-commerce scenarios. In this paper, we propose K-PLUG, a knowledge-injected pre-trained language model based on the encoder-decoder transformer that can be transferred to both natural language understanding and generation tasks. We verify our method in a diverse range of e-commerce scenarios that require domain-specific knowledge. Specifically, we propose five knowledge-aware self-supervised pre-training objectives to formulate the learning of domain-specific knowledge, including e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions of product entities. K-PLUG achieves new state-of-the-art results on a suite of domain-specific NLP tasks, including product knowledge base completion, abstractive product summarization, and multi-turn dialogue, significantly outperforms baselines across the board, which demonstrates that the proposed method effectively learns a diverse set of domain-specific knowledge for both language understanding and generation tasks.
With the rapid prevalence of mobile devices and the dramatic proliferation of mobile applications (apps), app recommendation becomes an emergent task that would benefit both app users and stockholders. How to effectively organize and make full use of rich side information of users and apps is a key challenge to address the sparsity issue for traditional approaches. To meet this challenge, we proposed a novel end-to-end Knowledge Graph Convolutional Embedding Propagation Model (KGEP) for app recommendation. Specifically, we first designed a knowledge graph construction method to model the user and app side information, then adopted KG embedding techniques to capture the factual triplet-focused semantics of the side information related to the first-order structure of the KG, and finally proposed a relation-weighted convolutional embedding propagation model to capture the recommendation-focused semantics related to high-order structure of the KG. Extensive experiments conducted on a real-world dataset validate the effectiveness of the proposed approach compared to the state-of-the-art recommendation approaches.
The estimation of causal effects is a primary goal of behavioral, social, economic and biomedical sciences. Under the unconfounded treatment assignment condition, adjustment for confounders requires estimating the nuisance functions relating outcome and/or treatment to confounders. The conventional approaches rely on either a parametric or a nonparametric modeling strategy to approximate the nuisance functions. Parametric methods can introduce serious bias into casual effect estimation due to possible mis-specification, while nonparametric estimation suffers from the "curse of dimensionality". This paper proposes a new unified approach for efficient estimation of treatment effects using feedforward artificial neural networks when the number of covariates is allowed to increase with the sample size. We consider a general optimization framework that includes the average, quantile and asymmetric least squares treatment effects as special cases. Under this unified setup, we develop a generalized optimization estimator for the treatment effect with the nuisance function estimated by neural networks. We further establish the consistency and asymptotic normality of the proposed estimator and show that it attains the semiparametric efficiency bound. The proposed methods are illustrated via simulation studies and a real data application.
Although Generative Adversarial Networks have shown remarkable performance in image generation, there are some challenges in image realism and convergence speed. The results of some models display the imbalances of quality within a generated image, in which some defective parts appear compared with other regions. Different from general single global optimization methods, we introduce an adaptive global and local bilevel optimization model(GL-GAN). The model achieves the generation of high-resolution images in a complementary and promoting way, where global optimization is to optimize the whole images and local is only to optimize the low-quality areas. With a simple network structure, GL-GAN is allowed to effectively avoid the nature of imbalance by local bilevel optimization, which is accomplished by first locating low-quality areas and then optimizing them. Moreover, by using feature map cues from discriminator output, we propose the adaptive local and global optimization method(Ada-OP) for specific implementation and find that it boosts the convergence speed. Compared with the current GAN methods, our model has shown impressive performance on CelebA, CelebA-HQ and LSUN datasets.
Terminal ductal lobular unit (TDLU) involution is the regression of milk-producing structures in the breast. Women with less TDLU involution are more likely to develop breast cancer. A major bottleneck in studying TDLU involution in large cohort studies is the need for labor-intensive manual assessment of TDLUs. We developed a computational pathology solution to automatically capture TDLU involution measures. Whole slide images (WSIs) of benign breast biopsies were obtained from the Nurses' Health Study (NHS). A first set of 92 WSIs was annotated for TDLUs, acini and adipose tissue to train deep convolutional neural network (CNN) models for detection of acini, and segmentation of TDLUs and adipose tissue. These networks were integrated into a single computational method to capture TDLU involution measures including number of TDLUs per tissue area, median TDLU span and median number of acini per TDLU. We validated our method on 40 additional WSIs by comparing with manually acquired measures. Our CNN models detected acini with an F1 score of 0.73$\pm$0.09, and segmented TDLUs and adipose tissue with Dice scores of 0.86$\pm$0.11 and 0.86$\pm$0.04, respectively. The inter-observer ICC scores for manual assessments on 40 WSIs of number of TDLUs per tissue area, median TDLU span, and median acini count per TDLU were 0.71, 95% CI [0.51, 0.83], 0.81, 95% CI [0.67, 0.90], and 0.73, 95% CI [0.54, 0.85], respectively. Intra-observer reliability was evaluated on 10/40 WSIs with ICC scores of >0.8. Inter-observer ICC scores between automated results and the mean of the two observers were: 0.80, 95% CI [0.63, 0.90] for number of TDLUs per tissue area, 0.57, 95% CI [0.19, 0.77] for median TDLU span, and 0.80, 95% CI [0.62, 0.89] for median acini count per TDLU. TDLU involution measures evaluated by manual and automated assessment were inversely associated with age and menopausal status.
With the rapid development of online advertising and recommendation systems, click-through rate prediction is expected to play an increasingly important role.Recently many DNN-based models which follow a similar Embedding&MLP paradigm have been proposed, and have achieved good result in image/voice and nlp fields.In these methods the Wide&Deep model announced by Google plays a key role.Most models first map large scale sparse input features into low-dimensional vectors which are transformed to fixed-length vectors, then concatenated together before being fed into a multilayer perceptron (MLP) to learn non-linear relations among input features. The number of trainable variables normally grow dramatically the number of feature fields and the embedding dimension grow. It is a big challenge to get state-of-the-art result through training deep neural network and embedding together, which falls into local optimal or overfitting easily.In this paper, we propose an Unstructured Semantic Model (USM) to tackles this challenge by designing a orthogonal base convolution and pooling model which adaptively learn the multi-scale base semantic representation between features supervised by the click label.The output of USM are then used in the Wide&Deep for CTR prediction.Experiments on two public datasets as well as real Weibo production dataset with over 1 billion samples have demonstrated the effectiveness of our proposed approach with superior performance comparing to state-of-the-art methods.
Multi-view point cloud registration is a hot topic in the communities of multimedia technology and artificial intelligence (AI). In this paper, we propose a framework to reconstruct the 3D models by the multi-view point cloud registration algorithm with adaptive convergence threshold, and subsequently apply it to 3D model retrieval. The iterative closest point (ICP) algorithm is implemented combining with the motion average algorithm for the registration of multi-view point clouds. After the registration process, we design applications for 3D model retrieval. The geometric saliency map is computed based on the vertex curvature. The test facial triangle is then generated based on the saliency map, which is applied to compare with the standard facial triangle. The face and non-face models are then discriminated. The experiments and comparisons prove the effectiveness of the proposed framework.