Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoqian Jiang

MULTIPAR: Supervised Irregular Tensor Factorization with Multi-task Learning

Aug 01, 2022
Yifei Ren, Jian Lou, Li Xiong, Joyce C Ho, Xiaoqian Jiang, Sivasubramanium Bhavan

Figure 1 for MULTIPAR: Supervised Irregular Tensor Factorization with Multi-task Learning

Figure 2 for MULTIPAR: Supervised Irregular Tensor Factorization with Multi-task Learning

Figure 3 for MULTIPAR: Supervised Irregular Tensor Factorization with Multi-task Learning

Figure 4 for MULTIPAR: Supervised Irregular Tensor Factorization with Multi-task Learning

Tensor factorization has received increasing interest due to its intrinsic ability to capture latent factors in multi-dimensional data with many applications such as recommender systems and Electronic Health Records (EHR) mining. PARAFAC2 and its variants have been proposed to address irregular tensors where one of the tensor modes is not aligned, e.g., different users in recommender systems or patients in EHRs may have different length of records. PARAFAC2 has been successfully applied on EHRs for extracting meaningful medical concepts (phenotypes). Despite recent advancements, current models' predictability and interpretability are not satisfactory, which limits its utility for downstream analysis. In this paper, we propose MULTIPAR: a supervised irregular tensor factorization with multi-task learning. MULTIPAR is flexible to incorporate both static (e.g. in-hospital mortality prediction) and continuous or dynamic (e.g. the need for ventilation) tasks. By supervising the tensor factorization with downstream prediction tasks and leveraging information from multiple related predictive tasks, MULTIPAR can yield not only more meaningful phenotypes but also better predictive performance for downstream tasks. We conduct extensive experiments on two real-world temporal EHR datasets to demonstrate that MULTIPAR is scalable and achieves better tensor fit with more meaningful subgroups and stronger predictive performance compared to existing state-of-the-art methods.

Via

Access Paper or Ask Questions

Scalable Causal Structure Learning: New Opportunities in Biomedicine

Oct 15, 2021
Pulakesh Upadhyaya, Kai Zhang, Can Li, Xiaoqian Jiang, Yejin Kim

Figure 1 for Scalable Causal Structure Learning: New Opportunities in Biomedicine

Figure 2 for Scalable Causal Structure Learning: New Opportunities in Biomedicine

Figure 3 for Scalable Causal Structure Learning: New Opportunities in Biomedicine

Figure 4 for Scalable Causal Structure Learning: New Opportunities in Biomedicine

This paper gives a practical tutorial on popular causal structure learning models with examples of real-world data to help healthcare audiences understand and apply them. We review prominent traditional, score-based and machine-learning based schemes for causal structure discovery, study some of their performance over some benchmark datasets, and discuss some of the applications to biomedicine. In the case of sufficient data, machine learning-based approaches can be scalable, can include a greater number of variables than traditional approaches, and can potentially be applied in many biomedical applications.

Via

Access Paper or Ask Questions

Federated Learning Algorithms for Generalized Mixed-effects Model (GLMM) on Horizontally Partitioned Data from Distributed Sources

Sep 28, 2021
Wentao Li, Jiayi Tong, Md. Monowar Anjum, Noman Mohammed, Yong Chen, Xiaoqian Jiang

Figure 1 for Federated Learning Algorithms for Generalized Mixed-effects Model (GLMM) on Horizontally Partitioned Data from Distributed Sources

Figure 2 for Federated Learning Algorithms for Generalized Mixed-effects Model (GLMM) on Horizontally Partitioned Data from Distributed Sources

Figure 3 for Federated Learning Algorithms for Generalized Mixed-effects Model (GLMM) on Horizontally Partitioned Data from Distributed Sources

Figure 4 for Federated Learning Algorithms for Generalized Mixed-effects Model (GLMM) on Horizontally Partitioned Data from Distributed Sources

Objectives: This paper develops two algorithms to achieve federated generalized linear mixed effect models (GLMM), and compares the developed model's outcomes with each other, as well as that from the standard R package (`lme4'). Methods: The log-likelihood function of GLMM is approximated by two numerical methods (Laplace approximation and Gaussian Hermite approximation), which supports federated decomposition of GLMM to bring computation to data. Results: Our developed method can handle GLMM to accommodate hierarchical data with multiple non-independent levels of observations in a federated setting. The experiment results demonstrate comparable (Laplace) and superior (Gaussian-Hermite) performances with simulated and real-world data. Conclusion: We developed and compared federated GLMMs with different approximations, which can support researchers in analyzing biomedical data to accommodate mixed effects and address non-independence due to hierarchical structures (i.e., institutes, region, country, etc.).

* 19 pages, 5 figures, submitted to Journal of Biomedical Informatics

Via

Access Paper or Ask Questions

Heterogeneous Treatment Effect Estimation using machine learning for Healthcare application: tutorial and benchmark

Sep 27, 2021
Yaobin Ling, Pulakesh Upadhyaya, Luyao Chen, Xiaoqian Jiang, Yejin Kim

Figure 1 for Heterogeneous Treatment Effect Estimation using machine learning for Healthcare application: tutorial and benchmark

Figure 2 for Heterogeneous Treatment Effect Estimation using machine learning for Healthcare application: tutorial and benchmark

Figure 3 for Heterogeneous Treatment Effect Estimation using machine learning for Healthcare application: tutorial and benchmark

Figure 4 for Heterogeneous Treatment Effect Estimation using machine learning for Healthcare application: tutorial and benchmark

Developing new drugs for target diseases is a time-consuming and expensive task, drug repurposing has become a popular topic in the drug development field. As much health claim data become available, many studies have been conducted on the data. The real-world data is noisy, sparse, and has many confounding factors. In addition, many studies have shown that drugs effects are heterogeneous among the population. Lots of advanced machine learning models about estimating heterogeneous treatment effects (HTE) have emerged in recent years, and have been applied to in econometrics and machine learning communities. These studies acknowledge medicine and drug development as the main application area, but there has been limited translational research from the HTE methodology to drug development. We aim to introduce the HTE methodology to the healthcare area and provide feasibility consideration when translating the methodology with benchmark experiments on healthcare administrative claim data. Also, we want to use benchmark experiments to show how to interpret and evaluate the model when it is applied to healthcare research. By introducing the recent HTE techniques to a broad readership in biomedical informatics communities, we expect to promote the wide adoption of causal inference using machine learning. We also expect to provide the feasibility of HTE for personalized drug effectiveness.

* 35 pages, 7 figures

Via

Access Paper or Ask Questions

Use of the Deep Learning Approach to Measure Alveolar Bone Level

Sep 24, 2021
Chun-Teh Lee, Tanjida Kabir, Jiman Nelson, Sally Sheng, Hsiu-Wan Meng, Thomas E. Van Dyke, Muhammad F. Walji, Xiaoqian Jiang, Shayan Shams

Figure 1 for Use of the Deep Learning Approach to Measure Alveolar Bone Level

Figure 2 for Use of the Deep Learning Approach to Measure Alveolar Bone Level

Figure 3 for Use of the Deep Learning Approach to Measure Alveolar Bone Level

Figure 4 for Use of the Deep Learning Approach to Measure Alveolar Bone Level

Abstract: Aim: The goal was to use a Deep Convolutional Neural Network to measure the radiographic alveolar bone level to aid periodontal diagnosis. Material and methods: A Deep Learning (DL) model was developed by integrating three segmentation networks (bone area, tooth, cementoenamel junction) and image analysis to measure the radiographic bone level and assign radiographic bone loss (RBL) stages. The percentage of RBL was calculated to determine the stage of RBL for each tooth. A provisional periodontal diagnosis was assigned using the 2018 periodontitis classification. RBL percentage, staging, and presumptive diagnosis were compared to the measurements and diagnoses made by the independent examiners. Results: The average Dice Similarity Coefficient (DSC) for segmentation was over 0.91. There was no significant difference in RBL percentage measurements determined by DL and examiners (p=0.65). The Area Under the Receiver Operating Characteristics Curve of RBL stage assignment for stage I, II and III was 0.89, 0.90 and 0.90, respectively. The accuracy of the case diagnosis was 0.85. Conclusion: The proposed DL model provides reliable RBL measurements and image-based periodontal diagnosis using periapical radiographic images. However, this model has to be further optimized and validated by a larger number of images to facilitate its application.

* Word count: 3485; Number of figures: 4; tables: 2; references: 34

Via

Access Paper or Ask Questions

De-identification of Unstructured Clinical Texts from Sequence to Sequence Perspective

Sep 10, 2021
Md Monowar Anjum, Noman Mohammed, Xiaoqian Jiang

Figure 1 for De-identification of Unstructured Clinical Texts from Sequence to Sequence Perspective

Figure 2 for De-identification of Unstructured Clinical Texts from Sequence to Sequence Perspective

In this work, we propose a novel problem formulation for de-identification of unstructured clinical text. We formulate the de-identification problem as a sequence to sequence learning problem instead of a token classification problem. Our approach is inspired by the recent state-of -the-art performance of sequence to sequence learning models for named entity recognition. Early experimentation of our proposed approach achieved 98.91% recall rate on i2b2 dataset. This performance is comparable to current state-of-the-art models for unstructured clinical text de-identification.

* Accepted in Poster Track for ACM CCS 2021

Via

Access Paper or Ask Questions

A Fast PC Algorithm with Reversed-order Pruning and A Parallelization Strategy

Sep 10, 2021
Kai Zhang, Chao Tian, Kun Zhang, Todd Johnson, Xiaoqian Jiang

Figure 1 for A Fast PC Algorithm with Reversed-order Pruning and A Parallelization Strategy

Figure 2 for A Fast PC Algorithm with Reversed-order Pruning and A Parallelization Strategy

Figure 3 for A Fast PC Algorithm with Reversed-order Pruning and A Parallelization Strategy

Figure 4 for A Fast PC Algorithm with Reversed-order Pruning and A Parallelization Strategy

The PC algorithm is the state-of-the-art algorithm for causal structure discovery on observational data. It can be computationally expensive in the worst case due to the conditional independence tests are performed in an exhaustive-searching manner. This makes the algorithm computationally intractable when the task contains several hundred or thousand nodes, particularly when the true underlying causal graph is dense. We propose a critical observation that the conditional set rendering two nodes independent is non-unique, and including certain redundant nodes do not sacrifice result accuracy. Based on this finding, the innovations of our work are two-folds. First, we innovate on a reserve order linkage pruning PC algorithm which significantly increases the algorithm's efficiency. Second, we propose a parallel computing strategy for statistical independence tests by leveraging tensor computation, which brings further speedup. We also prove the proposed algorithm does not induce statistical power loss under mild graph and data dimensionality assumptions. Experimental results show that the single-threaded version of the proposed algorithm can achieve a 6-fold speedup compared to the PC algorithm on a dense 95-node graph, and the parallel version can make a 825-fold speed-up. We also provide proof that the proposed algorithm is consistent under the same set of conditions with conventional PC algorithm.

* 37 pages

Via

Access Paper or Ask Questions

An Empirical Study of UMLS Concept Extraction from Clinical Notes using Boolean Combination Ensembles

Aug 04, 2021
Greg M. Silverman, Raymond L. Finzel, Michael V. Heinz, Jake Vasilakes, Jacob C. Solinsky, Reed McEwan, Benjamin C. Knoll, Christopher J. Tignanelli, Hongfang Liu, Hua Xu, Xiaoqian Jiang, Genevieve B. Melton, Serguei VS Pakhomov

Figure 1 for An Empirical Study of UMLS Concept Extraction from Clinical Notes using Boolean Combination Ensembles

Figure 2 for An Empirical Study of UMLS Concept Extraction from Clinical Notes using Boolean Combination Ensembles

Figure 3 for An Empirical Study of UMLS Concept Extraction from Clinical Notes using Boolean Combination Ensembles

Figure 4 for An Empirical Study of UMLS Concept Extraction from Clinical Notes using Boolean Combination Ensembles

Our objective in this study is to investigate the behavior of Boolean operators on combining annotation output from multiple Natural Language Processing (NLP) systems across multiple corpora and to assess how filtering by aggregation of Unified Medical Language System (UMLS) Metathesaurus concepts affects system performance for Named Entity Recognition (NER) of UMLS concepts. We used three corpora annotated for UMLS concepts: 2010 i2b2 VA challenge set (31,161 annotations), Multi-source Integrated Platform for Answering Clinical Questions (MiPACQ) corpus (17,457 annotations including UMLS concept unique identifiers), and Fairview Health Services corpus (44,530 annotations). Our results showed that for UMLS concept matching, Boolean ensembling of the MiPACQ corpus trended towards higher performance over individual systems. Use of an approximate grid-search can help optimize the precision-recall tradeoff and can provide a set of heuristics for choosing an optimal set of ensembles.

Via

Access Paper or Ask Questions

Relational graph convolutional networks for predicting blood-brain barrier penetration of drug molecules

Jul 04, 2021
Yan Ding, Xiaoqian Jiang, Yejin Kim

Figure 1 for Relational graph convolutional networks for predicting blood-brain barrier penetration of drug molecules

Figure 2 for Relational graph convolutional networks for predicting blood-brain barrier penetration of drug molecules

Figure 3 for Relational graph convolutional networks for predicting blood-brain barrier penetration of drug molecules

Figure 4 for Relational graph convolutional networks for predicting blood-brain barrier penetration of drug molecules

The evaluation of the BBB penetrating ability of drug molecules is a critical step in brain drug development. Computational prediction based on machine learning has proved to be an efficient way to conduct the evaluation. However, performance of the established models has been limited by their incapability of dealing with the interactions between drugs and proteins, which play an important role in the mechanism behind BBB penetrating behaviors. To address this issue, we employed the relational graph convolutional network (RGCN) to handle the drug-protein (denoted by the encoding gene) relations as well as the features of each individual drug. In addition, drug-drug similarity was also introduced to connect structurally similar drugs in the graph. The RGCN model was initially trained without input of any drug features. And the performance was already promising, demonstrating the significant role of the drug-protein/drug-drug relations in the prediction of BBB permeability. Moreover, molecular embeddings from a pre-trained knowledge graph were used as the drug features to further enhance the predictive ability of the model. Finally, the best performing RGCN model was built with a large number of unlabeled drugs integrated into the graph.

Via

Access Paper or Ask Questions