Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Recommendation": models, code, and papers

Online learning with Corrupted context: Corrupted Contextual Bandits

Jun 26, 2020
Djallel Bouneffouf

We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the context used at each decision may be corrupted ("useless context"). This new problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. In order to address the corrupted-context setting,we propose to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism. Unlike standard contextual bandit methods, we are able to learn from all iteration, even those with corrupted context, by improving the computing of the expectation for each arm. Promising empirical results are obtained on several real-life datasets.


  Access Paper or Ask Questions

Time series classification for varying length series

Oct 10, 2019
Chang Wei Tan, Francois Petitjean, Eamonn Keogh, Geoffrey I. Webb

Research into time series classification has tended to focus on the case of series of uniform length. However, it is common for real-world time series data to have unequal lengths. Differing time series lengths may arise from a number of fundamentally different mechanisms. In this work, we identify and evaluate two classes of such mechanisms -- variations in sampling rate relative to the relevant signal and variations between the start and end points of one time series relative to one another. We investigate how time series generated by each of these classes of mechanism are best addressed for time series classification. We perform extensive experiments and provide practical recommendations on how variations in length should be handled in time series classification.

* 23 pages 

  Access Paper or Ask Questions

hyperdoc2vec: Distributed Representations of Hypertext Documents

May 10, 2018
Jialong Han, Yan Song, Wayne Xin Zhao, Shuming Shi, Haisong Zhang

Hypertext documents, such as web pages and academic papers, are of great importance in delivering information in our daily life. Although being effective on plain documents, conventional text embedding methods suffer from information loss if directly adapted to hyper-documents. In this paper, we propose a general embedding approach for hyper-documents, namely, hyperdoc2vec, along with four criteria characterizing necessary information that hyper-document embedding models should preserve. Systematic comparisons are conducted between hyperdoc2vec and several competitors on two tasks, i.e., paper classification and citation recommendation, in the academic paper domain. Analyses and experiments both validate the superiority of hyperdoc2vec to other models w.r.t. the four criteria.

* Accepted to ACL 2018 

  Access Paper or Ask Questions

Continuous Features Discretization for Anomaly Intrusion Detectors Generation

Mar 07, 2014
Amira Sayed A. Aziz, Ahmad Taher Azar, Aboul Ella Hassanien, Sanaa Al-Ola Hanafy

Network security is a growing issue, with the evolution of computer systems and expansion of attacks. Biological systems have been inspiring scientists and designs for new adaptive solutions, such as genetic algorithms. In this paper, we present an approach that uses the genetic algorithm to generate anomaly net- work intrusion detectors. In this paper, an algorithm propose use a discretization method for the continuous features selected for the intrusion detection, to create some homogeneity between values, which have different data types. Then,the intrusion detection system is tested against the NSL-KDD data set using different distance methods. A comparison is held amongst the results, and it is shown by the end that this proposed approach has good results, and recommendations is given for future experiments.


  Access Paper or Ask Questions

Modular Domain Adaptation

Apr 26, 2022
Junshen K. Chen, Dallas Card, Dan Jurafsky

Off-the-shelf models are widely used by computational social science researchers to measure properties of text, such as sentiment. However, without access to source data it is difficult to account for domain shift, which represents a threat to validity. Here, we treat domain adaptation as a modular process that involves separate model producers and model consumers, and show how they can independently cooperate to facilitate more accurate measurements of text. We introduce two lightweight techniques for this scenario, and demonstrate that they reliably increase out-of-domain accuracy on four multi-domain text classification datasets when used with linear and contextual embedding models. We conclude with recommendations for model producers and consumers, and release models and replication code to accompany this paper.

* Findings of ACL (2022) 

  Access Paper or Ask Questions

A Data-Centric Behavioral Machine Learning Platform to Reduce Health Inequalities

Nov 17, 2021
Dexian Tang, Guillem Francès, África Periáñez

Providing front-line health workers in low- and middle- income countries with recommendations and predictions to improve health outcomes can have a tremendous impact on reducing healthcare inequalities, for instance by helping to prevent the thousands of maternal and newborn deaths that occur every day. To that end, we are developing a data-centric machine learning platform that leverages the behavioral logs from a wide range of mobile health applications running in those countries. Here we describe the platform architecture, focusing on the details that help us to maximize the quality and organization of the data throughout the whole process, from the data ingestion with a data-science purposed software development kit to the data pipelines, feature engineering and model management.

* Short paper, accepted for publication at the 2021 Neurips Data-Centric AI Workshop 

  Access Paper or Ask Questions

Empirical Comparison of Graph Embeddings for Trust-Based Collaborative Filtering

Mar 30, 2020
Tomislav Duricic, Hussain Hussain, Emanuel Lacic, Dominik Kowald, Denis Helic, Elisabeth Lex

In this work, we study the utility of graph embeddings to generate latent user representations for trust-based collaborative filtering. In a cold-start setting, on three publicly available datasets, we evaluate approaches from four method families: (i) factorization-based, (ii) random walk-based, (iii) deep learning-based, and (iv) the Large-scale Information Network Embedding (LINE) approach. We find that across the four families, random-walk-based approaches consistently achieve the best accuracy. Besides, they result in highly novel and diverse recommendations. Furthermore, our results show that the use of graph embeddings in trust-based collaborative filtering significantly improves user coverage.

* 10 pages, Accepted as a full paper on the 25th International Symposium on Methodologies for Intelligent Systems (ISMIS'20) 

  Access Paper or Ask Questions

REflex: Flexible Framework for Relation Extraction in Multiple Domains

Jul 20, 2019
Geeticka Chauhan, Matthew B. A. McDermott, Peter Szolovits

Systematic comparison of methods for relation extraction (RE) is difficult because many experiments in the field are not described precisely enough to be completely reproducible and many papers fail to report ablation studies that would highlight the relative contributions of their various combined techniques. In this work, we build a unifying framework for RE, applying this on three highly used datasets (from the general, biomedical and clinical domains) with the ability to be extendable to new datasets. By performing a systematic exploration of modeling, pre-processing and training methodologies, we find that choices of pre-processing are a large contributor performance and that omission of such information can further hinder fair comparison. Other insights from our exploration allow us to provide recommendations for future research in this area.

* accepted by BioNLP 2019 at the Association of Computation Linguistics 2019 

  Access Paper or Ask Questions

Reflex: Flexible Framework for Relation Extraction in Multiple Domains

Jun 19, 2019
Geeticka Chauhan, Matthew B. A. McDermott, Peter Szolovits

Systematic comparison of methods for relation extraction (RE) is difficult because many experiments in the field are not described precisely enough to be completely reproducible and many papers fail to report ablation studies that would highlight the relative contributions of their various combined techniques. In this work, we build a unifying framework for RE, applying this on three highly used datasets (from the general, biomedical and clinical domains) with the ability to be extendable to new datasets. By performing a systematic exploration of modeling, pre-processing and training methodologies, we find that choices of pre-processing are a large contributor performance and that omission of such information can further hinder fair comparison. Other insights from our exploration allow us to provide recommendations for future research in this area.

* accepted by BioNLP 2019 at the Association of Computation Linguistics 2019 

  Access Paper or Ask Questions

<<
279
280
281
282
283
284
285
286
287
288
289
290
291
>>