Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Richard Nock

Rademacher Observations, Private Data, and Boosting

Apr 02, 2015
Richard Nock, Giorgio Patrini, Arik Friedman

Figure 1 for Rademacher Observations, Private Data, and Boosting

Figure 2 for Rademacher Observations, Private Data, and Boosting

Figure 3 for Rademacher Observations, Private Data, and Boosting

Figure 4 for Rademacher Observations, Private Data, and Boosting

The minimization of the logistic loss is a popular approach to batch supervised learning. Our paper starts from the surprising observation that, when fitting linear (or kernelized) classifiers, the minimization of the logistic loss is \textit{equivalent} to the minimization of an exponential \textit{rado}-loss computed (i) over transformed data that we call Rademacher observations (rados), and (ii) over the \textit{same} classifier as the one of the logistic loss. Thus, a classifier learnt from rados can be \textit{directly} used to classify \textit{observations}. We provide a learning algorithm over rados with boosting-compliant convergence rates on the \textit{logistic loss} (computed over examples). Experiments on domains with up to millions of examples, backed up by theoretical arguments, display that learning over a small set of random rados can challenge the state of the art that learns over the \textit{complete} set of examples. We show that rados comply with various privacy requirements that make them good candidates for machine learning in a privacy framework. We give several algebraic, geometric and computational hardness results on reconstructing examples from rados. We also show how it is possible to craft, and efficiently learn from, rados in a differential privacy framework. Tests reveal that learning from differentially private rados can compete with learning from random rados, and hence with batch learning from examples, achieving non-trivial privacy vs accuracy tradeoffs.

Via

Access Paper or Ask Questions

Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Jun 23, 2014
Frank Nielsen, Richard Nock

Figure 1 for Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Figure 2 for Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Figure 3 for Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Figure 4 for Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Finding the optimal $k$-means clustering is NP-hard in general and many heuristics have been designed for minimizing monotonically the $k$-means objective. We first show how to extend Lloyd's batched relocation heuristic and Hartigan's single-point relocation heuristic to take into account empty-cluster and single-point cluster events, respectively. Those events tend to increasingly occur when $k$ or $d$ increases, or when performing several restarts. First, we show that those special events are a blessing because they allow to partially re-seed some cluster centers while further minimizing the $k$-means objective function. Second, we describe a novel heuristic, merge-and-split $k$-means, that consists in merging two clusters and splitting this merged cluster again with two new centers provided it improves the $k$-means objective. This novel heuristic can improve Hartigan's $k$-means when it has converged to a local minimum. We show empirically that this merge-and-split $k$-means improves over the Hartigan's heuristic which is the {\em de facto} method of choice. Finally, we propose the $(k,l)$-means objective that generalizes the $k$-means objective by associating the data points to their $l$ closest cluster centers, and show how to either directly convert or iteratively relax the $(k,l)$-means into a $k$-means in order to reach better local minima.

* 14 pages

Via

Access Paper or Ask Questions

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

May 26, 2014
Frank Nielsen, Richard Nock

Figure 1 for Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

Figure 2 for Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

Figure 3 for Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

We present a generic dynamic programming method to compute the optimal clustering of $n$ scalar elements into $k$ pairwise disjoint intervals. This case includes 1D Euclidean $k$-means, $k$-medoids, $k$-medians, $k$-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate $k$ by model selection. Finally, we illustrate and refine the method on two case studies: Bregman clustering and statistical mixture learning maximizing the complete likelihood.

* 10 pages, 3 figures

Via

Access Paper or Ask Questions

Combining Feature and Prototype Pruning by Uncertainty Minimization

Jan 16, 2013
Marc Sebban, Richard Nock

Figure 1 for Combining Feature and Prototype Pruning by Uncertainty Minimization

Figure 2 for Combining Feature and Prototype Pruning by Uncertainty Minimization

Figure 3 for Combining Feature and Prototype Pruning by Uncertainty Minimization

Figure 4 for Combining Feature and Prototype Pruning by Uncertainty Minimization

We focus in this paper on dataset reduction techniques for use in k-nearest neighbor classification. In such a context, feature and prototype selections have always been independently treated by the standard storage reduction algorithms. While this certifying is theoretically justified by the fact that each subproblem is NP-hard, we assume in this paper that a joint storage reduction is in fact more intuitive and can in practice provide better results than two independent processes. Moreover, it avoids a lot of distance calculations by progressively removing useless instances during the feature pruning. While standard selection algorithms often optimize the accuracy to discriminate the set of solutions, we use in this paper a criterion based on an uncertainty measure within a nearest-neighbor graph. This choice comes from recent results that have proven that accuracy is not always the suitable criterion to optimize. In our approach, a feature or an instance is removed if its deletion improves information of the graph. Numerous experiments are presented in this paper and a statistical analysis shows the relevance of our approach, and its tolerance in the presence of noise.

* Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

Via

Access Paper or Ask Questions

On Rényi and Tsallis entropies and divergences for exponential families

May 17, 2011
Frank Nielsen, Richard Nock

Many common probability distributions in statistics like the Gaussian, multinomial, Beta or Gamma distributions can be studied under the unified framework of exponential families. In this paper, we prove that both R\'enyi and Tsallis divergences of distributions belonging to the same exponential family admit a generic closed form expression. Furthermore, we show that R\'enyi and Tsallis entropies can also be calculated in closed-form for sub-families including the Gaussian or exponential distributions, among others.

* Journal of Physics A: Mathematical and Theoretical, Volume 45 Number 3, 2012
* 7 pages

Via

Access Paper or Ask Questions

Boosting k-NN for categorization of natural scenes

Jan 08, 2010
Paolo Piro, Richard Nock, Frank Nielsen, Michel Barlaud

Figure 1 for Boosting k-NN for categorization of natural scenes

Figure 2 for Boosting k-NN for categorization of natural scenes

Figure 3 for Boosting k-NN for categorization of natural scenes

Figure 4 for Boosting k-NN for categorization of natural scenes

The k-nearest neighbors (k-NN) classification rule has proven extremely successful in countless many computer vision applications. For example, image categorization often relies on uniform voting among the nearest prototypes in the space of descriptors. In spite of its good properties, the classic k-NN rule suffers from high variance when dealing with sparse prototype datasets in high dimensions. A few techniques have been proposed to improve k-NN classification, which rely on either deforming the nearest neighborhood relationship or modifying the input space. In this paper, we propose a novel boosting algorithm, called UNN (Universal Nearest Neighbors), which induces leveraged k-NN, thus generalizing the classic k-NN rule. We redefine the voting rule as a strong classifier that linearly combines predictions from the k closest prototypes. Weak classifiers are learned by UNN so as to minimize a surrogate risk. A major feature of UNN is the ability to learn which prototypes are the most relevant for a given class, thus allowing one for effective data reduction. Experimental results on the synthetic two-class dataset of Ripley show that such a filtering strategy is able to reject "noisy" prototypes. We carried out image categorization experiments on a database containing eight classes of natural scenes. We show that our method outperforms significantly the classic k-NN classification, while enabling significant reduction of the computational cost by means of data filtering.

* under revision for IJCV

Via

Access Paper or Ask Questions

Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm

Oct 07, 2008
Richard Nock, Pascal Vaillant, Frank Nielsen, Claudia Henry

Figure 1 for Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm

Figure 2 for Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm

Figure 3 for Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm

Figure 4 for Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm

Without prior knowledge, distinguishing different languages may be a hard task, especially when their borders are permeable. We develop an extension of spectral clustering -- a powerful unsupervised classification toolbox -- that is shown to resolve accurately the task of soft language distinction. At the heart of our approach, we replace the usual hard membership assignment of spectral clustering by a soft, probabilistic assignment, which also presents the advantage to bypass a well-known complexity bottleneck of the method. Furthermore, our approach relies on a novel, convenient construction of a Markov chain out of a corpus. Extensive experiments with a readily available system clearly display the potential of the method, which brings a visually appealing soft distinction of languages that may define altogether a whole corpus.

* ECAI 2006: 17th European Conference on Artificial Intelligence. Riva del Garda, Italy, 29 August - 1st September 2006
* 6 pages, 7 embedded figures, LaTeX 2e using the ecai2006.cls document class and the algorithm2e.sty style file (+ standard packages like epsfig, amsmath, amssymb, amsfonts...). Extends the short version contained in the ECAI 2006 proceedings

Via

Access Paper or Ask Questions

Analyse spectrale des textes: détection automatique des frontières de langue et de discours

Oct 07, 2008
Pascal Vaillant, Richard Nock, Claudia Henry

Figure 1 for Analyse spectrale des textes: détection automatique des frontières de langue et de discours

Figure 2 for Analyse spectrale des textes: détection automatique des frontières de langue et de discours

Figure 3 for Analyse spectrale des textes: détection automatique des frontières de langue et de discours

We propose a theoretical framework within which information on the vocabulary of a given corpus can be inferred on the basis of statistical information gathered on that corpus. Inferences can be made on the categories of the words in the vocabulary, and on their syntactical properties within particular languages. Based on the same statistical data, it is possible to build matrices of syntagmatic similarity (bigram transition matrices) or paradigmatic similarity (probability for any pair of words to share common contexts). When clustered with respect to their syntagmatic similarity, words tend to group into sublanguage vocabularies, and when clustered with respect to their paradigmatic similarity, into syntactic or semantic classes. Experiments have explored the first of these two possibilities. Their results are interpreted in the frame of a Markov chain modelling of the corpus' generative processe(s): we show that the results of a spectral analysis of the transition matrix can be interpreted as probability distributions of words within clusters. This method yields a soft clustering of the vocabulary into sublanguages which contribute to the generation of heterogeneous corpora. As an application, we show how multilingual texts can be visually segmented into linguistically homogeneous segments. Our method is specifically useful in the case of related languages which happened to be mixed in corpora.

* Verbum ex machina: Actes de la 13eme conference annuelle sur le Traitement Automatique des Langues Naturelles (TALN 2006), p. 619-629. Louvain (Leuven), Belgique, 10-13 avril 2006
* In French. 10 pages, 5 figures, LaTeX 2e using EPSF and custom package taln2006.sty (designed by Pierre Zweigenbaum, ATALA). Proceedings of the 13th annual French-speaking conference on Natural Language Processing: `Traitement Automatique des Langues Naturelles' (TALN 2006), Louvain (Leuven), Belgium, 10-13 April 2003

Via

Access Paper or Ask Questions

Staring at Economic Aggregators through Information Lenses

Jan 02, 2008
Richard Nock, Nicolas Sanz, Fred Celimene, Frank Nielsen

Figure 1 for Staring at Economic Aggregators through Information Lenses

Figure 2 for Staring at Economic Aggregators through Information Lenses

Figure 3 for Staring at Economic Aggregators through Information Lenses

Figure 4 for Staring at Economic Aggregators through Information Lenses

It is hard to exaggerate the role of economic aggregators -- functions that summarize numerous and / or heterogeneous data -- in economic models since the early XX$^{th}$ century. In many cases, as witnessed by the pioneering works of Cobb and Douglas, these functions were information quantities tailored to economic theories, i.e. they were built to fit economic phenomena. In this paper, we look at these functions from the complementary side: information. We use a recent toolbox built on top of a vast class of distortions coined by Bregman, whose application field rivals metrics' in various subfields of mathematics. This toolbox makes it possible to find the quality of an aggregator (for consumptions, prices, labor, capital, wages, etc.), from the standpoint of the information it carries. We prove a rather striking result. From the informational standpoint, well-known economic aggregators do belong to the \textit{optimal} set. As common economic assumptions enter the analysis, this large set shrinks, and it essentially ends up \textit{exactly fitting} either CES, or Cobb-Douglas, or both. To summarize, in the relevant economic contexts, one could not have crafted better some aggregator from the information standpoint. We also discuss global economic behaviors of optimal information aggregators in general, and present a brief panorama of the links between economic and information aggregators. Keywords: Economic Aggregators, CES, Cobb-Douglas, Bregman divergences

* 18 pages, 2 tables, 3 figures

Via

Access Paper or Ask Questions