Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Recommendation": models, code, and papers

Predictive Maintenance for Edge-Based Sensor Networks: A Deep Reinforcement Learning Approach

Jul 07, 2020
Kevin Shen Hoong Ong, Dusit Niyato, Chau Yuen

Failure of mission-critical equipment interrupts production and results in monetary loss. The risk of unplanned equipment downtime can be minimized through Predictive Maintenance of revenue generating assets to ensure optimal performance and safe operation of equipment. However, the increased sensorization of the equipment generates a data deluge, and existing machine-learning based predictive model alone becomes inadequate for timely equipment condition predictions. In this paper, a model-free Deep Reinforcement Learning algorithm is proposed for predictive equipment maintenance from an equipment-based sensor network context. Within each equipment, a sensor device aggregates raw sensor data, and the equipment health status is analyzed for anomalous events. Unlike traditional black-box regression models, the proposed algorithm self-learns an optimal maintenance policy and provides actionable recommendation for each equipment. Our experimental results demonstrate the potential for broader range of equipment maintenance applications as an automatic learning framework.

* 6 pages, 5 figures, accepted in IEEE WF-IoT 2020 

  Access Paper or Ask Questions

Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language

May 22, 2020
Philipp Scharpf, Moritz Schubotz, Abdou Youssef, Felix Hamborg, Norman Meuschke, Bela Gipp

In this paper, we show how selecting and combining encodings of natural and mathematical language affect classification and clustering of documents with mathematical content. We demonstrate this by using sets of documents, sections, and abstracts from the arXiv preprint server that are labeled by their subject class (mathematics, computer science, physics, etc.) to compare different encodings of text and formulae and evaluate the performance and runtimes of selected classification and clustering algorithms. Our encodings achieve classification accuracies up to $82.8\%$ and cluster purities up to $69.4\%$ (number of clusters equals number of classes), and $99.9\%$ (unspecified number of clusters) respectively. We observe a relatively low correlation between text and math similarity, which indicates the independence of text and formulae and motivates treating them as separate features of a document. The classification and clustering can be employed, e.g., for document search and recommendation. Furthermore, we show that the computer outperforms a human expert when classifying documents. Finally, we evaluate and discuss multi-label classification and formula semantification.

* Proceedings of the ACM/IEEE Joint Conference on Digital Libraries JCDL 2020 

  Access Paper or Ask Questions

Tensor denoising and completion based on ordinal observations

Feb 16, 2020
Chanwoo Lee, Miaoyan Wang

Higher-order tensors arise frequently in applications such as neuroimaging, recommendation system, social network analysis, and psychological studies. We consider the problem of low-rank tensor estimation from possibly incomplete, ordinal-valued observations. Two related problems are studied, one on tensor denoising and another on tensor completion. We propose a multi-linear cumulative link model, develop a rank-constrained M-estimator, and obtain theoretical accuracy guarantees. Our mean squared error bound enjoys a faster convergence rate than previous results, and we show that the proposed estimator is minimax optimal under the class of low-rank models. Furthermore, the procedure developed serves as an efficient completion method which guarantees consistent recovery of an order-$K$ $(d,\ldots,d)$-dimensional low-rank tensor using only $\tilde{\mathcal{O}}(Kd)$ noisy, quantized observations. We demonstrate the outperformance of our approach over previous methods on the tasks of clustering and collaborative filtering.

* 35 pages, 6 figures 

  Access Paper or Ask Questions

Measuring the Reliability of Reinforcement Learning Algorithms

Dec 10, 2019
Stephanie C. Y. Chan, Sam Fishman, John Canny, Anoop Korattikara, Sergio Guadarrama

Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. This problem has gained increasing attention in recent years, and efforts to improve it have grown substantially. To aid RL researchers and production users with the evaluation and improvement of reliability, we propose a set of metrics that quantitatively measure different aspects of reliability. In this work, we focus on variability and risk, both during training and after learning (on a fixed policy). We designed these metrics to be general-purpose, and we also designed complementary statistical tests to enable rigorous comparisons on these metrics. In this paper, we first describe the desired properties of the metrics and their design, the aspects of reliability that they measure, and their applicability to different scenarios. We then describe the statistical tests and make additional practical recommendations for reporting results. The metrics and accompanying statistical tools have been made available as an open-source library, here: . We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results.

* Accepted at the Workshop on Deep Reinforcement Learning at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada 

  Access Paper or Ask Questions

Algorithms and Statistical Models for Scientific Discovery in the Petabyte Era

Nov 05, 2019
Brian Nord, Andrew J. Connolly, Jamie Kinney, Jeremy Kubica, Gautaum Narayan, Joshua E. G. Peek, Chad Schafer, Erik J. Tollerud, Camille Avestruz, G. Jogesh Babu, Simon Birrer, Douglas Burke, João Caldeira, Douglas A. Caldwell, Joleen K. Carlberg, Yen-Chi Chen, Chuanfei Dong, Eric D. Feigelson, V. Zach Golkhou, Vinay Kashyap, T. S. Li, Thomas Loredo, Luisa Lucie-Smith, Kaisey S. Mandel, J. R. Martínez-Galarza, Adam A. Miller, Priyamvada Natarajan, Michelle Ntampaka, Andy Ptak, David Rapetti, Lior Shamir, Aneta Siemiginowska, Brigitta M. Sipőcz, Arfon M. Smith, Nhan Tran, Ricardo Vilalta, Lucianne M. Walkowicz, John ZuHone

The field of astronomy has arrived at a turning point in terms of size and complexity of both datasets and scientific collaboration. Commensurately, algorithms and statistical models have begun to adapt --- e.g., via the onset of artificial intelligence --- which itself presents new challenges and opportunities for growth. This white paper aims to offer guidance and ideas for how we can evolve our technical and collaborative frameworks to promote efficient algorithmic development and take advantage of opportunities for scientific discovery in the petabyte era. We discuss challenges for discovery in large and complex data sets; challenges and requirements for the next stage of development of statistical methodologies and algorithmic tool sets; how we might change our paradigms of collaboration and education; and the ethical implications of scientists' contributions to widely applicable algorithms and computational modeling. We start with six distinct recommendations that are supported by the commentary following them. This white paper is related to a larger corpus of effort that has taken place within and around the Petabytes to Science Workshops (

* arXiv admin note: substantial text overlap with arXiv:1905.05116 

  Access Paper or Ask Questions

Deep Set-to-Set Matching and Learning

Oct 22, 2019
Yuki Saito, Takuma Nakamura, Hirotaka Hachiya, Kenji Fukumizu

Matching two sets of items, called set-to-set matching problem, is being recently raised. The difficulties of set-to-set matching over ordinary data matching lie in the exchangeability in 1) set-feature extraction and 2) set-matching score; the pair of sets and the items in each set should be exchangeable. In this paper, we propose a deep learning architecture for the set-to-set matching that overcomes the above difficulties, including two novel modules: 1) a cross-set transformation and 2) cross-similarity function. The former provides the exchangeable set-feature through interactions between two sets in intermediate layers, and the latter provides the exchangeable set matching through calculating the cross-feature similarity of items between two sets. We evaluate the methods through experiments with two industrial applications, fashion set recommendation, and group re-identification. Through these experiments, we show that the proposed methods perform better than a baseline given by an extension of the Set Transformer, the state-of-the-art set-input function.

  Access Paper or Ask Questions

MFA is a Waste of Time! Understanding Negative Connotation Towards MFA Applications via User Generated Content

Aug 16, 2019
Sanchari Das, Bingxing Wang, L. Jean Camp

Traditional single-factor authentication possesses several critical security vulnerabilities due to single-point failure feature. Multi-factor authentication (MFA), intends to enhance security by providing additional verification steps. However, in practical deployment, users often experience dissatisfaction while using MFA, which leads to non-adoption. In order to understand the current design and usability issues with MFA, we analyze aggregated user generated comments (N = 12,500) about application-based MFA tools from major distributors, such as, Amazon, Google Play, Apple App Store, and others. While some users acknowledge the security benefits of MFA, majority of them still faced problems with initial configuration, system design understanding, limited device compatibility, and risk trade-offs leading to non-adoption of MFA. Based on these results, we provide actionable recommendations in technological design, initial training, and risk communication to improve the adoption and user experience of MFA.

* Proceedings of the Thirteenth International Symposium on Human Aspects of Information Security & Assurance (HAISA 2019) 

  Access Paper or Ask Questions

Orometric Methods in Bounded Metric Data

Jul 22, 2019
Maximilian Stubbemann, Tom Hanika, Gerd Stumme

A large amount of data accommodated in knowledge graphs (KG) is actually metric. For example, the Wikidata KG contains a plenitude of metric facts about geographic entities like cities, chemical compounds or celestial objects. In this paper, we propose a novel approach that transfers orometric (topographic) measures to bounded metric spaces. While these methods were originally designed to identify relevant mountain peaks on the surface of the earth, we demonstrate a notion to use them for metric data sets in general. Notably, metric sets of items inclosed in knowledge graphs. Based on this we present a method for identifying outstanding items using the transferred valuations functions 'isolation' and 'prominence'. Building up on this we imagine an item recommendation process. To demonstrate the relevance of the novel valuations for such processes we use item sets from the Wikidata knowledge graph. We then evaluate the usefulness of 'isolation' and 'prominence' empirically in a supervised machine learning setting. In particular, we find structurally relevant items in the geographic population distributions of Germany and France.

* 8 Pages, 1 figure 

  Access Paper or Ask Questions

Addition of Code Mixed Features to Enhance the Sentiment Prediction of Song Lyrics

Jun 11, 2018
Gangula Rama Rohit Reddy, Radhika Mamidi

Sentiment analysis, also called opinion mining, is the field of study that analyzes people's opinions,sentiments, attitudes and emotions. Songs are important to sentiment analysis since the songs and mood are mutually dependent on each other. Based on the selected song it becomes easy to find the mood of the listener, in future it can be used for recommendation. The song lyric is a rich source of datasets containing words that are helpful in analysis and classification of sentiments generated from it. Now a days we observe a lot of inter-sentential and intra-sentential code-mixing in songs which has a varying impact on audience. To study this impact we created a Telugu songs dataset which contained both Telugu-English code-mixed and pure Telugu songs. In this paper, we classify the songs based on its arousal as exciting or non-exciting. We develop a language identification tool and introduce code-mixing features obtained from it as additional features. Our system with these additional features attains 4-5% accuracy greater than traditional approaches on our dataset.

  Access Paper or Ask Questions

MPST: A Corpus of Movie Plot Synopses with Tags

Feb 23, 2018
Sudipta Kar, Suraj Maharjan, A. Pastor López-Monroy, Thamar Solorio

Social tagging of movies reveals a wide range of heterogeneous information about movies, like the genre, plot structure, soundtracks, metadata, visual and emotional experiences. Such information can be valuable in building automatic systems to create tags for movies. Automatic tagging systems can help recommendation engines to improve the retrieval of similar movies as well as help viewers to know what to expect from a movie in advance. In this paper, we set out to the task of collecting a corpus of movie plot synopses and tags. We describe a methodology that enabled us to build a fine-grained set of around 70 tags exposing heterogeneous characteristics of movie plots and the multi-label associations of these tags with some 14K movie plot synopses. We investigate how these tags correlate with movies and the flow of emotions throughout different types of movies. Finally, we use this corpus to explore the feasibility of inferring tags from plot synopses. We expect the corpus will be useful in other tasks where analysis of narratives is relevant.

* Accepted at LREC 2018 

  Access Paper or Ask Questions