Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Learning the Information Divergence

Jun 05, 2014
Onur Dikmen, Zhirong Yang, Erkki Oja

Information divergence that measures the difference between two nonnegative matrices or tensors has found its use in a variety of machine learning problems. Examples are Nonnegative Matrix/Tensor Factorization, Stochastic Neighbor Embedding, topic models, and Bayesian network optimization. The success of such a learning task depends heavily on a suitable divergence. A large variety of divergences have been suggested and analyzed, but very few results are available for an objective choice of the optimal divergence for a given task. Here we present a framework that facilitates automatic selection of the best divergence among a given family, based on standard maximum likelihood estimation. We first propose an approximated Tweedie distribution for the beta-divergence family. Selecting the best beta then becomes a machine learning problem solved by maximum likelihood. Next, we reformulate alpha-divergence in terms of beta-divergence, which enables automatic selection of alpha by maximum likelihood with reuse of the learning principle for beta-divergence. Furthermore, we show the connections between gamma and beta-divergences as well as R\'enyi and alpha-divergences, such that our automatic selection framework is extended to non-separable divergences. Experiments on both synthetic and real-world data demonstrate that our method can quite accurately select information divergence across different learning problems and various divergence families.

* 12 pages, 7 figures 

  Access Paper or Ask Questions

Empirical Analysis of Predictive Algorithms for Collaborative Filtering

Jan 30, 2013
John S. Breese, David Heckerman, Carl Kadie

Collaborative filtering or recommender systems use a database about user preferences to predict additional topics or products a new user might like. In this paper we describe several algorithms designed for this task, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods. We compare the predictive accuracy of the various methods in a set of representative problem domains. We use two basic classes of evaluation metrics. The first characterizes accuracy over a set of individual predictions in terms of average absolute deviation. The second estimates the utility of a ranked list of suggested items. This metric uses an estimate of the probability that a user will see a recommendation in an ordered list. Experiments were run for datasets associated with 3 application areas, 4 experimental protocols, and the 2 evaluation metrics for the various algorithms. Results indicate that for a wide range of conditions, Bayesian networks with decision trees at each node and correlation methods outperform Bayesian-clustering and vector-similarity methods. Between correlation and Bayesian networks, the preferred method depends on the nature of the dataset, nature of the application (ranked versus one-by-one presentation), and the availability of votes with which to make predictions. Other considerations include the size of database, speed of predictions, and learning time.

* Appears in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI1998) 

  Access Paper or Ask Questions

Quick Summary

Oct 12, 2012
Robert Wahlstedt

Quick Summary is an innovate implementation of an automatic document summarizer that inputs a document in the English language and evaluates each sentence. The scanner or evaluator determines criteria based on its grammatical structure and place in the paragraph. The program then asks the user to specify the number of sentences the person wishes to highlight. For example should the user ask to have three of the most important sentences, it would highlight the first and most important sentence in green. Commonly this is the sentence containing the conclusion. Then Quick Summary finds the second most important sentence usually called a satellite and highlights it in yellow. This is usually the topic sentence. Then the program finds the third most important sentence and highlights it in red. The implementations of this technology are useful in a society of information overload when a person typically receives 42 emails a day (Microsoft). The paper also is a candid look at difficulty that machine learning has in textural translating. However, it speaks on how to overcome the obstacles that historically prevented progress. This paper proposes mathematical meta-data criteria that justify the place of importance of a sentence. Just as tools for the study of relational symmetry in bio-informatics, this tool seeks to classify words with greater clarity. "Survey Finds Workers Average Only Three Productive Days per Week." Microsoft News Center. Microsoft. Web. 31 Mar. 2012.

  Access Paper or Ask Questions

A survey of statistical network models

Dec 29, 2009
Anna Goldenberg, Alice X Zheng, Stephen E Fienberg, Edoardo M Airoldi

Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.

* Foundations and Trends in Machine Learning, 2(2):1-117, 2009 
* 96 pages, 14 figures, 333 references 

  Access Paper or Ask Questions

RAVIR: A Dataset and Methodology for the Semantic Segmentation and Quantitative Analysis of Retinal Arteries and Veins in Infrared Reflectance Imaging

Mar 28, 2022
Ali Hatamizadeh, Hamid Hosseini, Niraj Patel, Jinseo Choi, Cameron C. Pole, Cory M. Hoeferlin, Steven D. Schwartz, Demetri Terzopoulos

The retinal vasculature provides important clues in the diagnosis and monitoring of systemic diseases including hypertension and diabetes. The microvascular system is of primary involvement in such conditions, and the retina is the only anatomical site where the microvasculature can be directly observed. The objective assessment of retinal vessels has long been considered a surrogate biomarker for systemic vascular diseases, and with recent advancements in retinal imaging and computer vision technologies, this topic has become the subject of renewed attention. In this paper, we present a novel dataset, dubbed RAVIR, for the semantic segmentation of Retinal Arteries and Veins in Infrared Reflectance (IR) imaging. It enables the creation of deep learning-based models that distinguish extracted vessel type without extensive post-processing. We propose a novel deep learning-based methodology, denoted as SegRAVIR, for the semantic segmentation of retinal arteries and veins and the quantitative measurement of the widths of segmented vessels. Our extensive experiments validate the effectiveness of SegRAVIR and demonstrate its superior performance in comparison to state-of-the-art models. Additionally, we propose a knowledge distillation framework for the domain adaptation of RAVIR pretrained networks on color images. We demonstrate that our pretraining procedure yields new state-of-the-art benchmarks on the DRIVE, STARE, and CHASE_DB1 datasets. Dataset link:

* Paper accepted to IEEE Journal of Biomedical Health Informatics (JBHI) 

  Access Paper or Ask Questions

PrintsGAN: Synthetic Fingerprint Generator

Jan 20, 2022
Joshua J. Engelsma, Steven A. Grosz, Anil K. Jain

A major impediment to researchers working in the area of fingerprint recognition is the lack of publicly available, large-scale, fingerprint datasets. The publicly available datasets that do exist contain very few identities and impressions per finger. This limits research on a number of topics, including e.g., using deep networks to learn fixed length fingerprint embeddings. Therefore, we propose PrintsGAN, a synthetic fingerprint generator capable of generating unique fingerprints along with multiple impressions for a given fingerprint. Using PrintsGAN, we synthesize a database of 525k fingerprints (35K distinct fingers, each with 15 impressions). Next, we show the utility of the PrintsGAN generated dataset by training a deep network to extract a fixed-length embedding from a fingerprint. In particular, an embedding model trained on our synthetic fingerprints and fine-tuned on a small number of publicly available real fingerprints (25K prints from NIST SD302) obtains a TAR of 87.03% @ FAR=0.01% on the NIST SD4 database (a boost from TAR=73.37% when only trained on NIST SD302). Prevailing synthetic fingerprint generation methods do not enable such performance gains due to i) lack of realism or ii) inability to generate multiple impressions per finger. We plan to release our database of synthetic fingerprints to the public.

  Access Paper or Ask Questions

Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Dec 16, 2021
Kyle Richardson, Ashish Sabharwal

Investigating the reasoning abilities of transformer models, and discovering new challenging tasks for them, has been a topic of much interest. Recent studies have found these models to be surprisingly strong at performing deductive reasoning over formal logical theories expressed in natural language. A shortcoming of these studies, however, is that they do not take into account that logical theories, when sampled uniformly at random, do not necessarily lead to hard instances. We propose a new methodology for creating challenging algorithmic reasoning datasets that focus on natural language satisfiability (NLSat) problems. The key idea is to draw insights from empirical sampling of hard propositional SAT problems and from complexity-theoretic studies of language. This methodology allows us to distinguish easy from hard instances, and to systematically increase the complexity of existing reasoning benchmarks such as RuleTaker. We find that current transformers, given sufficient training data, are surprisingly robust at solving the resulting NLSat problems of substantially increased difficulty. They also exhibit some degree of scale-invariance - the ability to generalize to problems of larger size and scope. Our results, however, reveal important limitations too: a careful sampling of training data is crucial for building models that generalize to larger problems, and transformer models' limited scale-invariance suggests they are far from learning robust deductive reasoning algorithms.

* Accepted to AAAI-2022, AAAI preprint 

  Access Paper or Ask Questions

On Recurrent Neural Networks for learning-based control: recent results and ideas for future developments

Nov 26, 2021
Fabio Bonassi, Marcello Farina, Jing Xie, Riccardo Scattolini

This paper aims to discuss and analyze the potentialities of Recurrent Neural Networks (RNN) in control design applications. The main families of RNN are considered, namely Neural Nonlinear AutoRegressive eXogenous, (NNARX), Echo State Networks (ESN), Long Short Term Memory (LSTM), and Gated Recurrent Units (GRU). The goal is twofold. Firstly, to survey recent results concerning the training of RNN that enjoy Input-to-State Stability (ISS) and Incremental Input-to-State Stability ({\delta}ISS) guarantees. Secondly, to discuss the issues that still hinder the widespread use of RNN for control, namely their robustness, verifiability, and interpretability. The former properties are related to the so-called generalization capabilities of the networks, i.e. their consistency with the underlying real plants, even in presence of unseen or perturbed input trajectories. The latter is instead related to the possibility of providing a clear formal connection between the RNN model and the plant. In this context, we illustrate how ISS and {\delta}ISS represent a significant step towards the robustness and verifiability of the RNN models, while the requirement of interpretability paves the way to the use of physics-based networks. The design of model predictive controllers with RNN as plant's model is also briefly discussed. Lastly, some of the main topics of the paper are illustrated on a simulated chemical system.

* This work has been submitted to Elsevier Journal of Process Control for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 

  Access Paper or Ask Questions

TsmoBN: Interventional Generalization for Unseen Clients in Federated Learning

Oct 19, 2021
Meirui Jiang, Xiaofei Zhang, Michael Kamp, Xiaoxiao Li, Qi Dou

Generalizing federated learning (FL) models to unseen clients with non-iid data is a crucial topic, yet unsolved so far. In this work, we propose to tackle this problem from a novel causal perspective. Specifically, we form a training structural causal model (SCM) to explain the challenges of model generalization in a distributed learning paradigm. Based on this, we present a simple yet effective method using test-specific and momentum tracked batch normalization (TsmoBN) to generalize FL models to testing clients. We give a causal analysis by formulating another testing SCM and demonstrate that the key factor in TsmoBN is the test-specific statistics (i.e., mean and variance) of features. Such statistics can be seen as a surrogate variable for causal intervention. In addition, by considering generalization bounds in FL, we show that our TsmoBN method can reduce divergence between training and testing feature distributions, which achieves a lower generalization gap than standard model testing. Our extensive experimental evaluations demonstrate significant improvements for unseen client generalization on three datasets with various types of feature distributions and numbers of clients. It is worth noting that our proposed approach can be flexibly applied to different state-of-the-art federated learning algorithms and is orthogonal to existing domain generalization methods.

  Access Paper or Ask Questions