Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Recommendation": models, code, and papers

GAIN: Graph Attention & Interaction Network for Inductive Semi-Supervised Learning over Large-scale Graphs

Nov 03, 2020
Yunpeng Weng, Xu Chen, Liang Chen, Wei Liu

Graph Neural Networks (GNNs) have led to state-of-the-art performance on a variety of machine learning tasks such as recommendation, node classification and link prediction. Graph neural network models generate node embeddings by merging nodes features with the aggregated neighboring nodes information. Most existing GNN models exploit a single type of aggregator (e.g., mean-pooling) to aggregate neighboring nodes information, and then add or concatenate the output of aggregator to the current representation vector of the center node. However, using only a single type of aggregator is difficult to capture the different aspects of neighboring information and the simple addition or concatenation update methods limit the expressive capability of GNNs. Not only that, existing supervised or semi-supervised GNN models are trained based on the loss function of the node label, which leads to the neglect of graph structure information. In this paper, we propose a novel graph neural network architecture, Graph Attention \& Interaction Network (GAIN), for inductive learning on graphs. Unlike the previous GNN models that only utilize a single type of aggregation method, we use multiple types of aggregators to gather neighboring information in different aspects and integrate the outputs of these aggregators through the aggregator-level attention mechanism. Furthermore, we design a graph regularized loss to better capture the topological relationship of the nodes in the graph. Additionally, we first present the concept of graph feature interaction and propose a vector-wise explicit feature interaction mechanism to update the node embeddings. We conduct comprehensive experiments on two node-classification benchmarks and a real-world financial news dataset. The experiments demonstrate our GAIN model outperforms current state-of-the-art performances on all the tasks.

* Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE) 

  Access Paper or Ask Questions

Where Does Trust Break Down? A Quantitative Trust Analysis of Deep Neural Networks via Trust Matrix and Conditional Trust Densities

Sep 30, 2020
Andrew Hryniowski, Xiao Yu Wang, Alexander Wong

The advances and successes in deep learning in recent years have led to considerable efforts and investments into its widespread ubiquitous adoption for a wide variety of applications, ranging from personal assistants and intelligent navigation to search and product recommendation in e-commerce. With this tremendous rise in deep learning adoption comes questions about the trustworthiness of the deep neural networks that power these applications. Motivated to answer such questions, there has been a very recent interest in trust quantification. In this work, we introduce the concept of trust matrix, a novel trust quantification strategy that leverages the recently introduced question-answer trust metric by Wong et al. to provide deeper, more detailed insights into where trust breaks down for a given deep neural network given a set of questions. More specifically, a trust matrix defines the expected question-answer trust for a given actor-oracle answer scenario, allowing one to quickly spot areas of low trust that needs to be addressed to improve the trustworthiness of a deep neural network. The proposed trust matrix is simple to calculate, humanly interpretable, and to the best of the authors' knowledge is the first to study trust at the actor-oracle answer level. We further extend the concept of trust densities with the notion of conditional trust densities. We experimentally leverage trust matrices to study several well-known deep neural network architectures for image recognition, and further study the trust density and conditional trust densities for an interesting actor-oracle answer scenario. The results illustrate that trust matrices, along with conditional trust densities, can be useful tools in addition to the existing suite of trust quantification metrics for guiding practitioners and regulators in creating and certifying deep learning solutions for trusted operation.

* 5 pages 

  Access Paper or Ask Questions

Validity of a clinical decision rule based alert system for drug dose adjustment in patients with renal failure intended to improve pharmacists' analysis of medication orders in hospitals

May 24, 2013
Boussadi Abdelali, Caruba Thibaut, Karras Alexandre, Berdot Sarah, Degoulet Patrice, Durieux Pierre, Sabatier Brigitte

Objective: The main objective of this study was to assess the diagnostic performances of an alert system integrated into the CPOE/EMR system for renally cleared drug dosing control. The generated alerts were compared with the daily routine practice of pharmacists as part of the analysis of medication orders. Materials and Methods: The pharmacists performed their analysis of medication orders as usual and were not aware of the alert system interventions that were not displayed for the purpose of the study neither to the physician nor to the pharmacist but kept with associate recommendations in a log file. A senior pharmacist analyzed the results of medication order analysis with and without the alert system. The unit of analysis was the drug prescription line. The primary study endpoints were the detection of drug-dose prescription errors and inter-rater reliability between the alert system and the pharmacists in the detection of drug dose error. Results: The alert system fired alerts in 8.41% (421/5006) of cases: 5.65% (283/5006) exceeds max daily dose alerts and 2.76% (138/5006) under dose alerts. The alert system and the pharmacists showed a relatively poor concordance: 0.106 (CI 95% [0.068, 0.144]). According to the senior pharmacist review, the alert system fired more appropriate alerts than pharmacists, and made fewer errors than pharmacists in analyzing drug dose prescriptions: 143 for the alert system and 261 for the pharmacists. Unlike the alert system, most diagnostic errors made by the pharmacists were false negatives. The pharmacists were not able to analyze a significant number (2097; 25.42%) of drug prescription lines because understaffing. Conclusion: This study strongly suggests that an alert system would be complementary to the pharmacists activity and contribute to drug prescription safety.

* Word count Body: 3753 Abstract: 280 tables: 5 figures: 1 pages: 26 references: 29 This article is the pre print version of an article submitted to the International Journal of Medical Informatics (IJMI, Elsevier) funding: This work was supported by Programme de recherche en qualit\'e hospitali\`ere (PREQHOS-PHRQ 1034 SADPM), The French Ministry of Health, grant number 115189 

  Access Paper or Ask Questions

Network In Graph Neural Network

Nov 23, 2021
Xiang Song, Runjie Ma, Jiahang Li, Muhan Zhang, David Paul Wipf

Graph Neural Networks (GNNs) have shown success in learning from graph structured data containing node/edge feature information, with application to social networks, recommendation, fraud detection and knowledge graph reasoning. In this regard, various strategies have been proposed in the past to improve the expressiveness of GNNs. For example, one straightforward option is to simply increase the parameter size by either expanding the hid-den dimension or increasing the number of GNN layers. However, wider hidden layers can easily lead to overfitting, and incrementally adding more GNN layers can potentially result in over-smoothing.In this paper, we present a model-agnostic methodology, namely Network In Graph Neural Network (NGNN ), that allows arbitrary GNN models to increase their model capacity by making the model deeper. However, instead of adding or widening GNN layers, NGNN deepens a GNN model by inserting non-linear feedforward neural network layer(s) within each GNN layer. An analysis of NGNN as applied to a GraphSage base GNN on ogbn-products data demonstrate that it can keep the model stable against either node feature or graph structure perturbations. Furthermore, wide-ranging evaluation results on both node classification and link prediction tasks show that NGNN works reliably across diverse GNN architectures.For instance, it improves the test accuracy of GraphSage on the ogbn-products by 1.6% and improves the [email protected] score of SEAL on ogbl-ppa by 7.08% and the [email protected] score of GraphSage+Edge-Attr on ogbl-ppi by 6.22%. And at the time of this submission, it achieved two first places on the OGB link prediction leaderboard.

  Access Paper or Ask Questions

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table

May 11, 2021
Huifeng Guo, Wei Guo, Yong Gao, Ruiming Tang, Xiuqiang He, Wenzhi Liu

Because of the superior feature representation ability of deep learning, various deep Click-Through Rate (CTR) models are deployed in the commercial systems by industrial companies. To achieve better performance, it is necessary to train the deep CTR models on huge volume of training data efficiently, which makes speeding up the training process an essential problem. Different from the models with dense training data, the training data for CTR models is usually high-dimensional and sparse. To transform the high-dimensional sparse input into low-dimensional dense real-value vectors, almost all deep CTR models adopt the embedding layer, which easily reaches hundreds of GB or even TB. Since a single GPU cannot afford to accommodate all the embedding parameters, when performing distributed training, it is not reasonable to conduct the data-parallelism only. Therefore, existing distributed training platforms for recommendation adopt model-parallelism. Specifically, they use CPU (Host) memory of servers to maintain and update the embedding parameters and utilize GPU worker to conduct forward and backward computations. Unfortunately, these platforms suffer from two bottlenecks: (1) the latency of pull \& push operations between Host and GPU; (2) parameters update and synchronization in the CPU servers. To address such bottlenecks, in this paper, we propose the ScaleFreeCTR: a MixCache-based distributed training system for CTR models. Specifically, in SFCTR, we also store huge embedding table in CPU but utilize GPU instead of CPU to conduct embedding synchronization efficiently. To reduce the latency of data transfer between both GPU-Host and GPU-GPU, the MixCache mechanism and Virtual Sparse Id operation are proposed. Comprehensive experiments and ablation studies are conducted to demonstrate the effectiveness and efficiency of SFCTR.

* 10 pages 

  Access Paper or Ask Questions

TB-Net: A Tailored, Self-Attention Deep Convolutional Neural Network Design for Detection of Tuberculosis Cases from Chest X-ray Images

Apr 14, 2021
Alexander Wong, James Ren Hou Lee, Hadi Rahmat-Khah, Ali Sabri, Amer Alaref

Tuberculosis (TB) remains a global health problem, and is the leading cause of death from an infectious disease. A crucial step in the treatment of tuberculosis is screening high risk populations and the early detection of the disease, with chest x-ray (CXR) imaging being the most widely-used imaging modality. As such, there has been significant recent interest in artificial intelligence-based TB screening solutions for use in resource-limited scenarios where there is a lack of trained healthcare workers with expertise in CXR interpretation. Motivated by this pressing need and the recent recommendation by the World Health Organization (WHO) for the use of computer-aided diagnosis of TB, we introduce TB-Net, a self-attention deep convolutional neural network tailored for TB case screening. More specifically, we leveraged machine-driven design exploration to build a highly customized deep neural network architecture with attention condensers. We conducted an explainability-driven performance validation process to validate TB-Net's decision-making behaviour. Experiments on CXR data from a multi-national patient cohort showed that the proposed TB-Net is able to achieve accuracy/sensitivity/specificity of 99.86%/100.0%/99.71%. Radiologist validation was conducted on select cases by two board-certified radiologists with over 10 and 19 years of experience, respectively, and showed consistency between radiologist interpretation and critical factors leveraged by TB-Net for TB case detection for the case where radiologists identified anomalies. While not a production-ready solution, we hope that the open-source release of TB-Net as part of the COVID-Net initiative will support researchers, clinicians, and citizen data scientists in advancing this field in the fight against this global public health crisis.

* 10 pages 

  Access Paper or Ask Questions

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Jun 01, 2020
Zhewei Yao, Amir Gholami, Sheng Shen, Kurt Keutzer, Michael W. Mahoney

We introduce AdaHessian, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the Hessian. Second order algorithms are among the most powerful optimization algorithms with superior convergence properties as compared to first order methods such as SGD and ADAM. The main disadvantage of traditional second order methods is their heavier per-iteration computation and poor accuracy as compared to first order methods. To address these, we incorporate several novel approaches in AdaHessian, including: (i) a new variance reduction estimate of the Hessian diagonal with low computational overhead; (ii) a root-mean-square exponential moving average to smooth out variations of the Hessian diagonal across different iterations; and (iii) a block diagonal averaging to reduce the variance of Hessian diagonal elements. We show that AdaHessian achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods, including variants of ADAM. In particular, we perform extensive tests on CV, NLP, and recommendation system tasks and find that AdaHessian: (i) achieves 1.80\%/1.45\% higher accuracy on ResNets20/32 on Cifar10, and 5.55\% higher accuracy on ImageNet as compared to ADAM; (ii) outperforms ADAMW for transformers by 0.27/0.33 BLEU score on IWSLT14/WMT14 and 1.8/1.0 PPL on PTB/Wikitext-103; and (iii) achieves 0.032\% better score than AdaGrad for DLRM on the Criteo Ad Kaggle dataset. Importantly, we show that the cost per iteration of AdaHessian is comparable to first-order methods, and that it exhibits robustness towards its hyperparameters. The code for AdaHessian is open-sourced and publicly available.

  Access Paper or Ask Questions

Estimating defectiveness of source code: A predictive model using GitHub content

Mar 21, 2018
Ritu Kapur, Balwinder Sodhi

Two key contributions presented in this paper are: i) A method for building a dataset containing source code features extracted from source files taken from Open Source Software (OSS) and associated bug reports, ii) A predictive model for estimating defectiveness of a given source code. These artifacts can be useful for building tools and techniques pertaining to several automated software engineering areas such as bug localization, code review, and recommendation and program repair. In order to achieve our goal, we first extract coding style information (e.g. related to programming language constructs used in the source code) for source code files present on GitHub. Then the information available in bug reports (if any) associated with these source code files are extracted. Thus fetched un(/ semi)-structured information is then transformed into a structured knowledge base. We considered more than 30400 source code files from 20 different GitHub repositories with about 14950 associated bug reports across 4 bug tracking portals. The source code files considered are written in four programming languages (viz., C, C++, Java, and Python) and belong to different types of applications. A machine learning (ML) model for estimating the defectiveness of a given input source code is then trained using the knowledge base. In order to pick the best ML model, we evaluated 8 different ML algorithms such as Random Forest, K Nearest Neighbour and SVM with around 50 parameter configurations to compare their performance on our tasks. One of our findings shows that best K-fold (with k=5) cross-validation results are obtained with the NuSVM technique that gives a mean F1 score of 0.914.

* Submitted to ACM ESEC/FSE 2018. Keywords: Maintaining software; Source code mining; Software defect identification; Automated software engineering; AI in software engineering 

  Access Paper or Ask Questions

Contextual Bandits with Latent Confounders: An NMF Approach

Oct 27, 2016
Rajat Sen, Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sanjay Shakkottai

Motivated by online recommendation and advertising systems, we consider a causal model for stochastic contextual bandits with a latent low-dimensional confounder. In our model, there are $L$ observed contexts and $K$ arms of the bandit. The observed context influences the reward obtained through a latent confounder variable with cardinality $m$ ($m \ll L,K$). The arm choice and the latent confounder causally determines the reward while the observed context is correlated with the confounder. Under this model, the $L \times K$ mean reward matrix $\mathbf{U}$ (for each context in $[L]$ and each arm in $[K]$) factorizes into non-negative factors $\mathbf{A}$ ($L \times m$) and $\mathbf{W}$ ($m \times K$). This insight enables us to propose an $\epsilon$-greedy NMF-Bandit algorithm that designs a sequence of interventions (selecting specific arms), that achieves a balance between learning this low-dimensional structure and selecting the best arm to minimize regret. Our algorithm achieves a regret of $\mathcal{O}\left(L\mathrm{poly}(m, \log K) \log T \right)$ at time $T$, as compared to $\mathcal{O}(LK\log T)$ for conventional contextual bandits, assuming a constant gap between the best arm and the rest for each context. These guarantees are obtained under mild sufficiency conditions on the factors that are weaker versions of the well-known Statistical RIP condition. We further propose a class of generative models that satisfy our sufficient conditions, and derive a lower bound of $\mathcal{O}\left(Km\log T\right)$. These are the first regret guarantees for online matrix completion with bandit feedback, when the rank is greater than one. We further compare the performance of our algorithm with the state of the art, on synthetic and real world data-sets.

* 37 pages, 2 figures 

  Access Paper or Ask Questions