Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

William Yuan

Construction of extra-large scale screening tools for risks of severe mental illnesses using real world healthcare data

Dec 20, 2022

Dianbo Liu, Karmel W. Choi, Paulo Lizano, William Yuan, Kun-Hsing Yu, Jordan Smoller, Isaac Kohane

Abstract:Importance: The prevalence of severe mental illnesses (SMIs) in the United States is approximately 3% of the whole population. The ability to conduct risk screening of SMIs at large scale could inform early prevention and treatment. Objective: A scalable machine learning based tool was developed to conduct population-level risk screening for SMIs, including schizophrenia, schizoaffective disorders, psychosis, and bipolar disorders,using 1) healthcare insurance claims and 2) electronic health records (EHRs). Design, setting and participants: Data from beneficiaries from a nationwide commercial healthcare insurer with 77.4 million members and data from patients from EHRs from eight academic hospitals based in the U.S. were used. First, the predictive models were constructed and tested using data in case-control cohorts from insurance claims or EHR data. Second, performance of the predictive models across data sources were analyzed. Third, as an illustrative application, the models were further trained to predict risks of SMIs among 18-year old young adults and individuals with substance associated conditions. Main outcomes and measures: Machine learning-based predictive models for SMIs in the general population were built based on insurance claims and EHR.

Via

Access Paper or Ask Questions

Approaching Small Molecule Prioritization as a Cross-Modal Information Retrieval Task through Coordinated Representation Learning

Nov 22, 2019

Samuel G. Finlayson, Matthew B. A. McDermott, Alex V. Pickering, Scott L. Lipnick, William Yuan, Isaac S. Kohane

Figure 1 for Approaching Small Molecule Prioritization as a Cross-Modal Information Retrieval Task through Coordinated Representation Learning

Figure 2 for Approaching Small Molecule Prioritization as a Cross-Modal Information Retrieval Task through Coordinated Representation Learning

Figure 3 for Approaching Small Molecule Prioritization as a Cross-Modal Information Retrieval Task through Coordinated Representation Learning

Figure 4 for Approaching Small Molecule Prioritization as a Cross-Modal Information Retrieval Task through Coordinated Representation Learning

Abstract:Modeling the relationship between chemical structure and molecular activity is a key task in drug development and precision medicine. In this paper, we utilize a novel deep learning architecture to jointly train coordinated embeddings of chemical structures and transcriptional signatures. We do so by training neural networks in a coordinated manner such that learned chemical representations correlate most highly with the encodings of the transcriptional patterns they induce. We then test this approach by using held-out gene expression signatures as queries into embedding space to recover their corresponding compounds. We evaluate these embeddings' utility for small molecule prioritization on this new benchmark task. Our method outperforms a series of baselines, successfully generalizing to unseen transcriptional experiments, but still struggles to generalize to entirely unseen chemical structures.

Via

Access Paper or Ask Questions

Privacy-Preserving Distributed Deep Learning for Clinical Data

Dec 04, 2018

Brett K. Beaulieu-Jones, William Yuan, Samuel G. Finlayson, Zhiwei Steven Wu

Abstract:Deep learning with medical data often requires larger samples sizes than are available at single providers. While data sharing among institutions is desirable to train more accurate and sophisticated models, it can lead to severe privacy concerns due the sensitive nature of the data. This problem has motivated a number of studies on distributed training of neural networks that do not require direct sharing of the training data. However, simple distributed training does not offer provable privacy guarantees to satisfy technical safe standards and may reveal information about the underlying patients. We present a method to train neural networks for clinical data in a distributed fashion under differential privacy. We demonstrate these methods on two datasets that include information from multiple independent sites, the eICU collaborative Research Database and The Cancer Genome Atlas.

* Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Via

Access Paper or Ask Questions