Real-world knowledge can be represented as a graph consisting of entities and relationships between the entities. The need for efficient and scalable solutions arises when dealing with vast genomic data, like RNA-sequencing. Knowledge graphs offer a powerful approach for various tasks in such large-scale genomic data, such as analysis and inference. In this work, variant-level information extracted from the RNA-sequences of vaccine-na\"ive COVID-19 patients have been represented as a unified, large knowledge graph. Variant call format (VCF) files containing the variant-level information were annotated to include further information for each variant. The data records in the annotated files were then converted to Resource Description Framework (RDF) triples. Each VCF file obtained had an associated CADD scores file that contained the raw and Phred-scaled scores for each variant. An ontology was defined for the VCF and CADD scores files. Using this ontology and the extracted information, a large, scalable knowledge graph was created. Available graph storage was then leveraged to query and create datasets for further downstream tasks. We also present a case study using the knowledge graph and perform a classification task using graph machine learning. We also draw comparisons between different Graph Neural Networks (GNNs) for the case study.
The 2030 Challenge is aimed at making all new buildings and major renovations carbon neutral by 2030. One of the potential solutions to meet this challenge is through innovative sustainable design strategies. For developing such strategies it is important to understand how the various building factors contribute to energy usage of a building, right at design time. The growth of artificial intelligence (AI) in recent years provides an unprecedented opportunity to advance sustainable design by learning complex relationships between building factors from available data. However, rich training datasets are needed for AI-based solutions to achieve good prediction accuracy. Unfortunately, obtaining training datasets are time consuming and expensive in many real-world applications. Motivated by these reasons, we address the problem of accurately predicting the energy usage of new or unknown building types, i.e., those building types that do not have any training data. We propose a novel approach based on zero-shot learning (ZSL) to solve this problem. Our approach uses side information from building energy modeling experts to predict the closest building types for a given new/unknown building type. We then obtain the predicted energy usage for the k-closest building types using the models learned during training and combine the predicted values using a weighted averaging function. We evaluated our approach on a dataset containing five building types generated using BuildSimHub, a popular platform for building energy modeling. Our approach achieved better average accuracy than a regression model (based on XGBoost) trained on the entire dataset of known building types.
Dynamic networks have intrinsic structural, computational, and multidisciplinary advantages. Link prediction estimates the next relationship in dynamic networks. However, in the current link prediction approaches, only bipartite or non-bipartite but homogeneous networks are considered. The use of adjacency matrix to represent dynamically evolving networks limits the ability to analytically learn from heterogeneous, sparse, or forming networks. In the case of a heterogeneous network, modeling all network states using a binary-valued matrix can be difficult. On the other hand, sparse or currently forming networks have many missing edges, which are represented as zeros, thus introducing class imbalance or noise. We propose a time-parameterized matrix (TP-matrix) and empirically demonstrate its effectiveness in non-bipartite, heterogeneous networks. In addition, we propose a predictive influence index as a measure of a node's boosting or diminishing predictive influence using backward and forward-looking maximization over the temporal space of the n-degree neighborhood. We further propose a new method of canonically representing heterogeneous time-evolving activities as a temporally parameterized network model (TPNM). The new method robustly enables activities to be represented as a form of a network, thus potentially inspiring new link prediction applications, including intelligent business process management systems and context-aware workflow engines. We evaluated our model on four datasets of different network systems. We present results that show the proposed model is more effective in capturing and retaining temporal relationships in dynamically evolving networks. We also show that our model performed better than state-of-the-art link prediction benchmark results for networks that are sensitive to temporal evolution.