Alert button
Picture for Charles Tapley Hoyt

Charles Tapley Hoyt

Alert button

An Open-Source Knowledge Graph Ecosystem for the Life Sciences

Jul 11, 2023
Tiffany J. Callahan, Ignacio J. Tripodi, Adrianne L. Stefanski, Luca Cappelletti, Sanya B. Taneja, Jordan M. Wyrwa, Elena Casiraghi, Nicolas A. Matentzoglu, Justin Reese, Jonathan C. Silverstein, Charles Tapley Hoyt, Richard D. Boyce, Scott A. Malec, Deepak R. Unni, Marcin P. Joachimiak, Peter N. Robinson, Christopher J. Mungall, Emanuele Cavalleri, Tommaso Fontana, Giorgio Valentini, Marco Mesiti, Lucas A. Gillenwater, Brook Santangelo, Nicole A. Vasilevsky, Robert Hoehndorf, Tellen D. Bennett, Patrick B. Ryan, George Hripcsak, Michael G. Kahn, Michael Bada, William A. Baumgartner Jr, Lawrence E. Hunter

Figure 1 for An Open-Source Knowledge Graph Ecosystem for the Life Sciences
Figure 2 for An Open-Source Knowledge Graph Ecosystem for the Life Sciences
Figure 3 for An Open-Source Knowledge Graph Ecosystem for the Life Sciences
Figure 4 for An Open-Source Knowledge Graph Ecosystem for the Life Sciences

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to automatically construct them. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluate the ecosystem by surveying open-source KG construction methods and analyzing its computational performance when constructing 12 large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.

Viaarxiv icon

A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs

Mar 14, 2022
Charles Tapley Hoyt, Max Berrendorf, Mikhail Gaklin, Volker Tresp, Benjamin M. Gyori

Figure 1 for A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs
Figure 2 for A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs
Figure 3 for A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs
Figure 4 for A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs

The link prediction task on knowledge graphs without explicit negative triples in the training data motivates the usage of rank-based metrics. Here, we review existing rank-based metrics and propose desiderata for improved metrics to address lack of interpretability and comparability of existing metrics to datasets of different sizes and properties. We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory. We finally propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.

Viaarxiv icon

An Open Challenge for Inductive Link Prediction on Knowledge Graphs

Mar 03, 2022
Mikhail Galkin, Max Berrendorf, Charles Tapley Hoyt

Figure 1 for An Open Challenge for Inductive Link Prediction on Knowledge Graphs
Figure 2 for An Open Challenge for Inductive Link Prediction on Knowledge Graphs
Figure 3 for An Open Challenge for Inductive Link Prediction on Knowledge Graphs
Figure 4 for An Open Challenge for Inductive Link Prediction on Knowledge Graphs

An emerging trend in representation learning over knowledge graphs (KGs) moves beyond transductive link prediction tasks over a fixed set of known entities in favor of inductive tasks that imply training on one graph and performing inference over a new graph with unseen entities. In inductive setups, node features are often not available and training shallow entity embedding matrices is meaningless as they cannot be used at inference time with unseen entities. Despite the growing interest, there are not enough benchmarks for evaluating inductive representation learning methods. In this work, we introduce ILPC 2022, a novel open challenge on KG inductive link prediction. To this end, we constructed two new datasets based on Wikidata with various sizes of training and inference graphs that are much larger than existing inductive benchmarks. We also provide two strong baselines leveraging recently proposed inductive methods. We hope this challenge helps to streamline community efforts in the inductive graph representation learning area. ILPC 2022 follows best practices on evaluation fairness and reproducibility, and is available at https://github.com/pykeen/ilpc2022.

* 4 pages 
Viaarxiv icon

ChemicalX: A Deep Learning Library for Drug Pair Scoring

Feb 14, 2022
Benedek Rozemberczki, Charles Tapley Hoyt, Anna Gogleva, Piotr Grabowski, Klas Karis, Andrej Lamov, Andriy Nikolov, Sebastian Nilsson, Michael Ughetto, Yu Wang, Tyler Derr, Benjamin M Gyori

Figure 1 for ChemicalX: A Deep Learning Library for Drug Pair Scoring
Figure 2 for ChemicalX: A Deep Learning Library for Drug Pair Scoring
Figure 3 for ChemicalX: A Deep Learning Library for Drug Pair Scoring
Figure 4 for ChemicalX: A Deep Learning Library for Drug Pair Scoring

In this paper, we introduce ChemicalX, a PyTorch-based deep learning library designed for providing a range of state of the art models to solve the drug pair scoring task. The primary objective of the library is to make deep drug pair scoring models accessible to machine learning researchers and practitioners in a streamlined framework.The design of ChemicalX reuses existing high level model training utilities, geometric deep learning, and deep chemistry layers from the PyTorch ecosystem. Our system provides neural network layers, custom pair scoring architectures, data loaders, and batch iterators for end users. We showcase these features with example code snippets and case studies to highlight the characteristics of ChemicalX. A range of experiments on real world drug-drug interaction, polypharmacy side effect, and combination synergy prediction tasks demonstrate that the models available in ChemicalX are effective at solving the pair scoring task. Finally, we show that ChemicalX could be used to train and score machine learning models on large drug pair datasets with hundreds of thousands of compounds on commodity hardware.

* https://github.com/AstraZeneca/chemicalx 
Viaarxiv icon

Wavelet-Packet Powered Deepfake Image Detection

Jun 17, 2021
Moritz Wolter, Felix Blanke, Charles Tapley Hoyt, Jochen Garcke

Figure 1 for Wavelet-Packet Powered Deepfake Image Detection
Figure 2 for Wavelet-Packet Powered Deepfake Image Detection
Figure 3 for Wavelet-Packet Powered Deepfake Image Detection
Figure 4 for Wavelet-Packet Powered Deepfake Image Detection

As neural networks become more able to generate realistic artificial images, they have the potential to improve movies, music, video games and make the internet an even more creative and inspiring place. Yet, at the same time, the latest technology potentially enables new digital ways to lie. In response, the need for a diverse and reliable toolbox arises to identify artificial images and other content. Previous work primarily relies on pixel-space CNN or the Fourier transform. To the best of our knowledge, wavelet-based gan analysis and detection methods have been absent thus far. This paper aims to fill this gap and describes a wavelet-based approach to gan-generated image analysis and detection. We evaluate our method on FFHQ, CelebA, and LSUN source identification problems and find improved or competitive performance.

* Source code is available at https://github.com/gan-police/frequency-forensics 
Viaarxiv icon

Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery

Jun 07, 2021
Stephen Bonner, Ian P Barrett, Cheng Ye, Rowan Swiers, Ola Engkvist, Charles Tapley Hoyt, William L Hamilton

Figure 1 for Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery
Figure 2 for Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery
Figure 3 for Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery
Figure 4 for Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery

Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required. In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting. To aid reproducibility of our own work, we release all experimentation code.

Viaarxiv icon

Leveraging Structured Biological Knowledge for Counterfactual Inference: a Case Study of Viral Pathogenesis

Jan 13, 2021
Jeremy Zucker, Kaushal Paneri, Sara Mohammad-Taheri, Somya Bhargava, Pallavi Kolambkar, Craig Bakker, Jeremy Teuton, Charles Tapley Hoyt, Kristie Oxford, Robert Ness, Olga Vitek

Figure 1 for Leveraging Structured Biological Knowledge for Counterfactual Inference: a Case Study of Viral Pathogenesis
Figure 2 for Leveraging Structured Biological Knowledge for Counterfactual Inference: a Case Study of Viral Pathogenesis
Figure 3 for Leveraging Structured Biological Knowledge for Counterfactual Inference: a Case Study of Viral Pathogenesis
Figure 4 for Leveraging Structured Biological Knowledge for Counterfactual Inference: a Case Study of Viral Pathogenesis

Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This manuscript proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the causal effect of medical countermeasures for severely ill patients.

* In proceeding of IEEE, Transactions on Big Data 
Viaarxiv icon

PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings

Jul 30, 2020
Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Sahand Sharifzadeh, Volker Tresp, Jens Lehmann

Figure 1 for PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings

Recently, knowledge graph embeddings (KGEs) received significant attention, and several software libraries have been developed for training and evaluating KGEs. While each of them addresses specific needs, we re-designed and re-implemented PyKEEN, one of the first KGE libraries, in a community effort. PyKEEN 1.0 enables users to compose knowledge graph embedding models (KGEMs) based on a wide range of interaction models, training approaches, loss functions, and permits the explicit modeling of inverse relations. Besides, an automatic memory optimization has been realized in order to exploit the provided hardware optimally, and through the integration of Optuna extensive hyper-parameter optimization (HPO) functionalities are provided.

Viaarxiv icon

PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Emebddings

Jul 28, 2020
Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Sahand Sharifzadeh, Volker Tresp, Jens Lehmann

Figure 1 for PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Emebddings

Recently, knowledge graph embeddings (KGEs) received significant attention, and several software libraries have been developed for training and evaluating KGEs. While each of them addresses specific needs, we re-designed and re-implemented PyKEEN, one of the first KGE libraries, in a community effort. PyKEEN 1.0 enables users to compose knowledge graph embedding models (KGEMs) based on a wide range of interaction models, training approaches, loss functions, and permits the explicit modeling of inverse relations. Besides, an automatic memory optimization has been realized in order to exploit the provided hardware optimally, and through the integration of Optuna extensive hyper-parameter optimization (HPO) functionalities are provided.

Viaarxiv icon