Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Recommendation": models, code, and papers

DeepCare: A Deep Dynamic Memory Model for Predictive Medicine

Apr 10, 2017
Trang Pham, Truyen Tran, Dinh Phung, Svetha Venkatesh

Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. At the data level, DeepCare represents care episodes as vectors in space, models patient health state trajectories through explicit memory of historical records. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timed events by moderating the forgetting and consolidation of memory cells. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling, intervention recommendation, and future risk prediction. On two important cohorts with heavy social and economic burden -- diabetes and mental health -- the results show improved modeling and risk prediction accuracy.

* Accepted at JBI under the new name: "Predicting healthcare trajectories from medical records: A deep learning approach" 

  Access Paper or Ask Questions

Hybrid Clustering based on Content and Connection Structure using Joint Nonnegative Matrix Factorization

Mar 28, 2017
Rundong Du, Barry Drake, Haesun Park

We present a hybrid method for latent information discovery on the data sets containing both text content and connection structure based on constrained low rank approximation. The new method jointly optimizes the Nonnegative Matrix Factorization (NMF) objective function for text clustering and the Symmetric NMF (SymNMF) objective function for graph clustering. We propose an effective algorithm for the joint NMF objective function, based on a block coordinate descent (BCD) framework. The proposed hybrid method discovers content associations via latent connections found using SymNMF. The method can also be applied with a natural conversion of the problem when a hypergraph formulation is used or the content is associated with hypergraph edges. Experimental results show that by simultaneously utilizing both content and connection structure, our hybrid method produces higher quality clustering results compared to the other NMF clustering methods that uses content alone (standard NMF) or connection structure alone (SymNMF). We also present some interesting applications to several types of real world data such as citation recommendations of papers. The hybrid method proposed in this paper can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures.

* 9 pages, Submitted to a conference, Feb. 2017 

  Access Paper or Ask Questions

Guess what? You can boost Federated Learning for free

Oct 21, 2021
Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires

Federated Learning (FL) exploits the computation power of edge devices, typically mobile phones, while addressing privacy by letting data stay where it is produced. FL has been used by major service providers to improve item recommendations, virtual keyboards and text auto-completion services. While appealing, FL performance is hampered by multiple factors: i) differing capabilities of participating clients (e.g., computing power, memory and network connectivity); ii) strict training constraints where devices must be idle, plugged-in and connected to an unmetered WiFi; and iii) data heterogeneity (a.k.a non-IIDness). Together, these lead to uneven participation, straggling, dropout and consequently slow down convergence, challenging the practicality of FL for many applications. In this paper, we present GeL, the Guess and Learn algorithm, that significantly speeds up convergence by guessing model updates for each client. The power of GeL is to effectively perform ''free'' learning steps without any additional gradient computations. GeL provides these guesses through clever use of moments in the Adam optimizer in combination with the last computed gradient on clients. Our extensive experimental study involving five standard FL benchmarks shows that GeL speeds up the convergence up to 1.64x in heterogeneous systems in the presence of data non-IIDness, saving tens of thousands of gradient computations.

* 11 pages, 6 figures 

  Access Paper or Ask Questions

AttentionFlow: Visualising Influence in Networks of Time Series

Feb 03, 2021
Minjeong Shin, Alasdair Tran, Siqi Wu, Alexander Mathews, Rong Wang, Georgiana Lyall, Lexing Xie

The collective attention on online items such as web pages, search terms, and videos reflects trends that are of social, cultural, and economic interest. Moreover, attention trends of different items exhibit mutual influence via mechanisms such as hyperlinks or recommendations. Many visualisation tools exist for time series, network evolution, or network influence; however, few systems connect all three. In this work, we present AttentionFlow, a new system to visualise networks of time series and the dynamic influence they have on one another. Centred around an ego node, our system simultaneously presents the time series on each node using two visual encodings: a tree ring for an overview and a line chart for details. AttentionFlow supports interactions such as overlaying time series of influence and filtering neighbours by time or flux. We demonstrate AttentionFlow using two real-world datasets, VevoMusic and WikiTraffic. We show that attention spikes in songs can be explained by external events such as major awards, or changes in the network such as the release of a new song. Separate case studies also demonstrate how an artist's influence changes over their career, and that correlated Wikipedia traffic is driven by cultural interests. More broadly, AttentionFlow can be generalised to visualise networks of time series on physical infrastructures such as road networks, or natural phenomena such as weather and geological measurements.

* The Proceedings of the Fourteenth ACM International Conference on Web Search and Data Mining (WSDM), 2021 
* Published in WSDM 2021. The demo is available at and code is available at 

  Access Paper or Ask Questions

EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP Applications

Nov 23, 2020
Minghui Qiu, Peng Li, Hanjie Pan, Chengyu Wang, Ang Wang, Cen Chen, Yaliang Li, Dehong Gao, Jun Huang, Yong Li, Jun Yang, Deng Cai, Wei Lin

The literature has witnessed the success of applying deep Transfer Learning (TL) algorithms to many NLP applications, yet it is not easy to build a simple and scalable TL toolkit for this purpose. To bridge this gap, the EasyTransfer platform is designed to make it easy to develop deep TL algorithms for NLP applications. It is built with rich API abstractions, a scalable architecture and comprehensive deep TL algorithms, to make the development of NLP applications easier. To be specific, the build-in data and model parallelism strategy shows to be 4x faster than the default distribution strategy of Tensorflow. EasyTransfer supports the mainstream pre-trained ModelZoo, including Pre-trained Language Models (PLMs) and multi-modality models. It also integrates various SOTA models for mainstream NLP applications in AppZoo, and supports mainstream TL algorithms as well. The toolkit is convenient for users to quickly start model training, evaluation, offline prediction, and online deployment. This system is currently deployed at Alibaba to support a variety of business scenarios, including item recommendation, personalized search, and conversational question answering. Extensive experiments on real-world datasets show that EasyTransfer is suitable for online production with cutting-edge performance. The source code of EasyTransfer is released at Github (

* 4 pages 

  Access Paper or Ask Questions

Improving Scholarly Knowledge Representation: Evaluating BERT-based Models for Scientific Relation Classification

Apr 13, 2020
Ming Jiang, Jennifer D'Souza, Sören Auer, J. Stephen Downie

With the rapidly growing number of research publications, there is a vast amount of scholarly information that needs to be organized in digital libraries. To deal with this challenge, digital libraries use semantic techniques to build knowledge-base structures for organizing scientific information. Identifying relations between scientific terms can help with the construction of a representative knowledge-based structure. While advanced automated techniques have been developed for relation extraction, many of these techniques were evaluated under different scenarios, which limits their comparability. To this end, this study presents a thorough empirical evaluation of eight Bert-based classification models by exploring two factors: 1) Bert model variants, and 2) classification strategies. To simulate real-world settings, we conduct our sentence-level assessment using the abstracts of scholarly publications in three corpora, two of which are distinct corpora and the third of which is the union of the first two. Our findings show that SciBert models perform better than Bert-BASE models. The strategy of classifying a single relation each time is preferred in the corpus consisting of abundant scientific relations, while the strategy of identifying multiple relations at one time is beneficial to the corpus with sparse relations. Our results offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build a structured knowledge-based system for the ease of scholarly information organization.

  Access Paper or Ask Questions

EEV Dataset: Predicting Expressions Evoked by Diverse Videos

Jan 15, 2020
Jennifer J. Sun, Ting Liu, Alan S. Cowen, Florian Schroff, Hartwig Adam, Gautam Prasad

When we watch videos, the visual and auditory information we experience can evoke a range of affective responses. The ability to automatically predict evoked affect from videos can help recommendation systems and social machines better interact with their users. Here, we introduce the Evoked Expressions in Videos (EEV) dataset, a large-scale dataset for studying viewer responses to videos based on their facial expressions. The dataset consists of a total of 4.8 million annotations of viewer facial reactions to 18,541 videos. We use a publicly available video corpus to obtain a diverse set of video content. The training split is fully machine-annotated, while the validation and test splits have both human and machine annotations. We verify the performance of our machine annotations with human raters to have an average precision of 73.3%. We establish baseline performance on the EEV dataset using an existing multimodal recurrent model. Our results show that affective information can be learned from EEV, but with a MAP of 20.32%, there is potential for improvement. This gap motivates the need for new approaches for understanding affective content. Our transfer learning experiments show an improvement in performance on the LIRIS-ACCEDE video dataset when pre-trained on EEV. We hope that the size and diversity of the EEV dataset will encourage further explorations in video understanding and affective computing.

  Access Paper or Ask Questions

Learning Nonsymmetric Determinantal Point Processes

May 30, 2019
Mike Gartrell, Victor-Emmanuel Brunel, Elvis Dohmatob, Syrine Krichene

Determinantal point processes (DPPs) have attracted substantial attention as an elegant probabilistic model that captures the balance between quality and diversity within sets. DPPs are conventionally parameterized by a positive semi-definite kernel matrix, and this symmetric kernel encodes only repulsive interactions between items. These so-called symmetric DPPs have significant expressive power, and have been successfully applied to a variety of machine learning tasks, including recommendation systems, information retrieval, and automatic summarization, among many others. Efficient algorithms for learning symmetric DPPs and sampling from these models have been reasonably well studied. However, relatively little attention has been given to nonsymmetric DPPs, which relax the symmetric constraint on the kernel. Nonsymmetric DPPs allow for both repulsive and attractive item interactions, which can significantly improve modeling power, resulting in a model that may better fit for some applications. We present a method that enables a tractable algorithm, based on maximum likelihood estimation, for learning nonsymmetric DPPs from data composed of observed subsets. Our method imposes a particular decomposition of the nonsymmetric kernel that enables such tractable learning algorithms, which we analyze both theoretically and experimentally. We evaluate our model on synthetic and real-world datasets, demonstrating improved predictive performance compared to symmetric DPPs, which have previously shown strong performance on modeling tasks associated with these datasets.

  Access Paper or Ask Questions

Deep Sentiment Analysis using a Graph-based Text Representation

Feb 23, 2019
Kayvan Bijari, Hadi Zare, Hadi Veisi, Emad Kebriaei

Social media brings about new ways of communication among people and is influencing trading strategies in the market. The popularity of social networks produces a large collection of unstructured data such as text and image in a variety of disciplines like business and health. The main element of social media arises as text which provokes a set of challenges for traditional information retrieval and natural language processing tools. Informal language, spelling errors, abbreviations, and special characters are typical in social media posts. These features lead to a prohibitively large vocabulary size for text mining methods. Another problem with traditional social text mining techniques is that they fail to take semantic relations into account, which is essential in a domain of applications such as event detection, opinion mining, and news recommendation. This paper set out to employ a network-based viewpoint on text documents and investigate the usefulness of graph representation to exploit word relations and semantics of the textual data. Moreover, the proposed approach makes use of a random walker to extract deep features of a graph to facilitate the task of document classification. The experimental results indicate that the proposed approach defeats the earlier sentiment analysis methods based on several benchmark datasets, and it generalizes well on different datasets without dependency for pre-trained word embeddings.

  Access Paper or Ask Questions

Adversarial Sampling and Training for Semi-Supervised Information Retrieval

Nov 09, 2018
Dae Hoon Park, Yi Chang

Modern ad-hoc retrieval models learned with implicit feedback have two problems in general. First, there are usually much more non-clicked documents than clicked documents, and many of the non-clicked documents are not informational. Second, modern ad-hoc retrieval models are vulnerable to adversarial examples due to the linear nature in the models. To solve the problems at the same time, we propose adversarial training methods that can overcome those weaknesses. Our key idea is to combine adversarial training with adversarial sampling in order to obtain very difficult examples, which are informational and can attack the linear nature of the models. Specifically, we adversarially sample difficult training examples, and based on them, we further generate adversarial examples that are even more difficult. To make the models robust, the generated adversarial examples as well as the original training examples are then given to the models for joint optimization. Experiments are performed on benchmark data sets for common ad-hoc retrieval tasks such as Web search, item recommendation, and question answering. The proposed methods are closely compared with IRGAN, which is a recent relevant approach that employs adversarial training. Experiment results indicate that the proposed methods significantly outperform strong baselines especially for high-ranked documents, and they outperform IRGAN in [email protected] using only 5% of labeled data for the Web search task.

  Access Paper or Ask Questions