Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Recommendation": models, code, and papers

Gaussian Embedding of Large-scale Attributed Graphs

Dec 02, 2019
Bhagya Hettige, Yuan-Fang Li, Weiqing Wang, Wray Buntine

Graph embedding methods transform high-dimensional and complex graph contents into low-dimensional representations. They are useful for a wide range of graph analysis tasks including link prediction, node classification, recommendation and visualization. Most existing approaches represent graph nodes as point vectors in a low-dimensional embedding space, ignoring the uncertainty present in the real-world graphs. Furthermore, many real-world graphs are large-scale and rich in content (e.g. node attributes). In this work, we propose GLACE, a novel, scalable graph embedding method that preserves both graph structure and node attributes effectively and efficiently in an end-to-end manner. GLACE effectively models uncertainty through Gaussian embeddings, and supports inductive inference of new nodes based on their attributes. In our comprehensive experiments, we evaluate GLACE on real-world graphs, and the results demonstrate that GLACE significantly outperforms state-of-the-art embedding methods on multiple graph analysis tasks.

  Access Paper or Ask Questions

LNDb: A Lung Nodule Database on Computed Tomography

Nov 25, 2019
João Pedrosa, Guilherme Aresta, Carlos Ferreira, Márcio Rodrigues, Patrícia Leitão, André Silva Carvalho, João Rebelo, Eduardo Negrão, Isabel Ramos, António Cunha, Aurélio Campilho

Lung cancer is the deadliest type of cancer worldwide and late detection is the major factor for the low survival rate of patients. Low dose computed tomography has been suggested as a potential screening tool but manual screening is costly, time-consuming and prone to variability. This has fueled the development of automatic methods for the detection, segmentation and characterisation of pulmonary nodules but its application to clinical routine is challenging. In this study, a new database for the development and testing of pulmonary nodule computer-aided strategies is presented which intends to complement current databases by giving additional focus to radiologist variability and local clinical reality. State-of-the-art nodule detection, segmentation and characterization methods are tested and compared to manual annotations as well as collaborative strategies combining multiple radiologists and radiologists and computer-aided systems. It is shown that state-of-the-art methodologies can determine a patient's follow-up recommendation as accurately as a radiologist, though the nodule detection method used shows decreased performance in this database.

  Access Paper or Ask Questions

Doubly-Robust Lasso Bandit

Jul 26, 2019
Gi-Soo Kim, Myunghee Cho Paik

Contextual multi-armed bandit algorithms are widely used in sequential decision tasks such as news article recommendation systems, web page ad placement algorithms, and mobile health. Most of the existing algorithms have regret proportional to a polynomial function of the context dimension, $d$. In many applications however, it is often the case that contexts are high-dimensional with only a sparse subset of size $s_0 (\ll d)$ being correlated with the reward. We propose a novel algorithm, namely the Doubly-Robust Lasso Bandit algorithm, which exploits the sparse structure as in Lasso, while blending the doubly-robust technique used in missing data literature. The high-probability upper bound of the regret incurred by the proposed algorithm does not depend on the number of arms, has better dependency on $s_0$ than previous works, and scales with $\mathrm{log}(d)$ instead of a polynomial function of $d$. The proposed algorithm shows good performance when contexts of different arms are correlated and requires less tuning parameters than existing methods.

* 10 pages, 1 figure 

  Access Paper or Ask Questions

Privacy Preserving Link Prediction with Latent Geometric Network Models

Jul 20, 2019
Abir De, Soumen Chakrabarti

Link prediction is an important task in social network analysis, with a wide variety of applications ranging from graph search to recommendation. The usual paradigm is to propose to each node a ranked list of nodes that are currently non-neighbors, as the most likely candidates for future linkage. Owing to increasing concerns about privacy, users (nodes) may prefer to keep some or all their connections private. Most link prediction heuristics, such as common neighbor, Jaccard coefficient, and Adamic-Adar, can leak private link information in making predictions. We present D P L P , a generic framework to protect differential privacy for these popular heuristics under the ranking objective. Under a recently-introduced latent node embedding model, we also analyze the trade-off between privacy and link prediction utility. Extensive experiments with eight diverse real-life graphs and several link prediction heuristics show that D P L P can trade off between privacy and predictive performance more effectively than several alternatives.

  Access Paper or Ask Questions

Multi-Level Deep Cascade Trees for Conversion Rate Prediction

Aug 29, 2018
Hong Wen, Jing Zhang, Quan Lin, Keping Yang, Taiwei Jin, Fuyu Lv, Xiaofeng Pan, Pipei Huang, Zheng-Jun Zha

Developing effective and efficient recommendation methods is very challenging for modern e-commerce platforms (e.g., Taobao). In this paper, we tackle this problem by proposing multi-Level Deep Cascade Trees (ldcTree), which is a novel decision tree ensemble approach. It leverages deep cascade structures by stacking Gradient Boosting Decision Trees (GBDT) to effectively learn feature representation. In addition, we propose to utilize the cross-entropy in each tree of the preceding GBDT as the input feature representation for next level GBDT, which has a clear explanation, i.e., a traversal from root to leaf nodes in the next level GBDT corresponds to the combination of certain traversals in the preceding GBDT. The deep cascade structure and the combination rule enable the proposed ldcTree to have a stronger distributed feature representation ability. Moreover, we propose an ensemble ldcTree to take full use of weak and strong correlation features. Experimental results on off-line dataset and online deployment demonstrate the effectiveness of the proposed methods.

* 9 pages, 8 figures, Submitted to CIKM'2018 

  Access Paper or Ask Questions

Topic Diffusion Discovery based on Sparseness-constrained Non-negative Matrix Factorization

Jul 12, 2018
Yihuang Kang, Keng-Pei Lin, I-Ling Cheng

Due to recent explosion of text data, researchers have been overwhelmed by ever-increasing volume of articles produced by different research communities. Various scholarly search websites, citation recommendation engines, and research databases have been created to simplify the text search tasks. However, it is still difficult for researchers to be able to identify potential research topics without doing intensive reviews on a tremendous number of articles published by journals, conferences, meetings, and workshops. In this paper, we consider a novel topic diffusion discovery technique that incorporates sparseness-constrained Non-negative Matrix Factorization with generalized Jensen-Shannon divergence to help understand term-topic evolutions and identify topic diffusions. Our experimental result shows that this approach can extract more prominent topics from large article databases, visualize relationships between terms of interest and abstract topics, and further help researchers understand whether given terms/topics have been widely explored or whether new topics are emerging from literature.

  Access Paper or Ask Questions

A Scalable Machine Learning Approach for Inferring Probabilistic US-LI-RADS Categorization

Jun 15, 2018
Imon Banerjee, Hailey H. Choi, Terry Desser, Daniel L. Rubin

We propose a scalable computerized approach for large-scale inference of Liver Imaging Reporting and Data System (LI-RADS) final assessment categories in narrative ultrasound (US) reports. Although our model was trained on reports created using a LI-RADS template, it was also able to infer LI-RADS scoring for unstructured reports that were created before the LI-RADS guidelines were established. No human-labelled data was required in any step of this study; for training, LI-RADS scores were automatically extracted from those reports that contained structured LI-RADS scores, and it translated the derived knowledge to reasoning on unstructured radiology reports. By providing automated LI-RADS categorization, our approach may enable standardizing screening recommendations and treatment planning of patients at risk for hepatocellular carcinoma, and it may facilitate AI-based healthcare research with US images by offering large scale text mining and data gathering opportunities from standard hospital clinical data repositories.

* AMIA Annual Symposium 2018 (accepted) 

  Access Paper or Ask Questions

Large-scale Validation of Counterfactual Learning Methods: A Test-Bed

Jun 25, 2017
Damien Lefortier, Adith Swaminathan, Xiaotao Gu, Thorsten Joachims, Maarten de Rijke

The ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news. Recent approaches for off-policy evaluation and learning in these settings appear promising. With this paper, we provide real-world data and a standardized test-bed to systematically investigate these algorithms using data from display advertising. In particular, we consider the problem of filling a banner ad with an aggregate of multiple products the user may want to purchase. This paper presents our test-bed, the sanity checks we ran to ensure its validity, and shows results comparing state-of-the-art off-policy learning methods like doubly robust optimization, POEM, and reductions to supervised learning using regression baselines. Our results show experimental evidence that recent off-policy learning methods can improve upon state-of-the-art supervised learning techniques on a large-scale real-world data set.

* 10 pages, What If workshop NIPS 2016 

  Access Paper or Ask Questions

Zonotope hit-and-run for efficient sampling from projection DPPs

Jun 15, 2017
Guillaume Gautier, Rémi Bardenet, Michal Valko

Determinantal point processes (DPPs) are distributions over sets of items that model diversity using kernels. Their applications in machine learning include summary extraction and recommendation systems. Yet, the cost of sampling from a DPP is prohibitive in large-scale applications, which has triggered an effort towards efficient approximate samplers. We build a novel MCMC sampler that combines ideas from combinatorial geometry, linear programming, and Monte Carlo methods to sample from DPPs with a fixed sample cardinality, also called projection DPPs. Our sampler leverages the ability of the hit-and-run MCMC kernel to efficiently move across convex bodies. Previous theoretical results yield a fast mixing time of our chain when targeting a distribution that is close to a projection DPP, but not a DPP in general. Our empirical results demonstrate that this extends to sampling projection DPPs, i.e., our sampler is more sample-efficient than previous approaches which in turn translates to faster convergence when dealing with costly-to-evaluate functions, such as summary extraction in our experiments.

* 12 pages, 12 figures, 2 columns, accepted to ICML 2017 

  Access Paper or Ask Questions