Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aditya Krishna Menon

The cost of fairness in classification

May 25, 2017
Aditya Krishna Menon, Robert C. Williamson

Figure 1 for The cost of fairness in classification

Figure 2 for The cost of fairness in classification

Figure 3 for The cost of fairness in classification

Figure 4 for The cost of fairness in classification

We study the problem of learning classifiers with a fairness constraint, with three main contributions towards the goal of quantifying the problem's inherent tradeoffs. First, we relate two existing fairness measures to cost-sensitive risks. Second, we show that for cost-sensitive classification and fairness measures, the optimal classifier is an instance-dependent thresholding of the class-probability function. Third, we show how the tradeoff between accuracy and fairness is determined by the alignment between the class-probabilities for the target and sensitive features. Underpinning our analysis is a general framework that casts the problem of learning with a fairness requirement as one of minimising the difference of two statistical risks.

Via

Access Paper or Ask Questions

A scaled Bregman theorem with applications

Jul 01, 2016
Richard Nock, Aditya Krishna Menon, Cheng Soon Ong

Figure 1 for A scaled Bregman theorem with applications

Figure 2 for A scaled Bregman theorem with applications

Figure 3 for A scaled Bregman theorem with applications

Figure 4 for A scaled Bregman theorem with applications

Bregman divergences play a central role in the design and analysis of a range of machine learning algorithms. This paper explores the use of Bregman divergences to establish reductions between such algorithms and their analyses. We present a new scaled isodistortion theorem involving Bregman divergences (scaled Bregman theorem for short) which shows that certain "Bregman distortions'" (employing a potentially non-convex generator) may be exactly re-written as a scaled Bregman divergence computed over transformed data. Admissible distortions include geodesic distances on curved manifolds and projections or gauge-normalisation, while admissible data include scalars, vectors and matrices. Our theorem allows one to leverage to the wealth and convenience of Bregman divergences when analysing algorithms relying on the aforementioned Bregman distortions. We illustrate this with three novel applications of our theorem: a reduction from multi-class density ratio to class-probability estimation, a new adaptive projection free yet norm-enforcing dual norm mirror descent algorithm, and a reduction from clustering on flat manifolds to clustering on curved manifolds. Experiments on each of these domains validate the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence properties that have been pervasive in machine learning.

Via

Access Paper or Ask Questions

Learning from Binary Labels with Instance-Dependent Corruption

May 04, 2016
Aditya Krishna Menon, Brendan van Rooyen, Nagarajan Natarajan

Figure 1 for Learning from Binary Labels with Instance-Dependent Corruption

Figure 2 for Learning from Binary Labels with Instance-Dependent Corruption

Figure 3 for Learning from Binary Labels with Instance-Dependent Corruption

Suppose we have a sample of instances paired with binary labels corrupted by arbitrary instance- and label-dependent noise. With sufficiently many such samples, can we optimally classify and rank instances with respect to the noise-free distribution? We provide a theoretical analysis of this question, with three main contributions. First, we prove that for instance-dependent noise, any algorithm that is consistent for classification on the noisy distribution is also consistent on the clean distribution. Second, we prove that for a broad class of instance- and label-dependent noise, a similar consistency result holds for the area under the ROC curve. Third, for the latter noise model, when the noise-free class-probability function belongs to the generalised linear model family, we show that the Isotron can efficiently and provably learn from the corrupted sample.

Via

Access Paper or Ask Questions

An Average Classification Algorithm

Dec 15, 2015
Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

Figure 1 for An Average Classification Algorithm

Figure 2 for An Average Classification Algorithm

Many classification algorithms produce a classifier that is a weighted average of kernel evaluations. When working with a high or infinite dimensional kernel, it is imperative for speed of evaluation and storage issues that as few training samples as possible are used in the kernel expansion. Popular existing approaches focus on altering standard learning algorithms, such as the Support Vector Machine, to induce sparsity, as well as post-hoc procedures for sparse approximations. Here we adopt the latter approach. We begin with a very simple classifier, given by the kernel mean $$ f(x) = \frac{1}{n} \sum\limits_{i=i}^{n} y_i K(x_i,x) $$ We then find a sparse approximation to this kernel mean via herding. The result is an accurate, easily parallelized algorithm for learning classifiers.

Via

Access Paper or Ask Questions

Learning with Symmetric Label Noise: The Importance of Being Unhinged

May 28, 2015
Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

Figure 1 for Learning with Symmetric Label Noise: The Importance of Being Unhinged

Figure 2 for Learning with Symmetric Label Noise: The Importance of Being Unhinged

Convex potential minimisation is the de facto approach to binary classification. However, Long and Servedio [2010] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing. This ostensibly shows that convex losses are not SLN-robust. In this paper, we propose a convex, classification-calibrated loss and prove that it is SLN-robust. The loss avoids the Long and Servedio [2010] result by virtue of being negatively unbounded. The loss is a modification of the hinge loss, where one does not clamp at zero; hence, we call it the unhinged loss. We show that the optimal unhinged solution is equivalent to that of a strongly regularised SVM, and is the limiting solution for any convex potential; this implies that strong l2 regularisation makes most standard learners SLN-robust. Experiments confirm the SLN-robustness of the unhinged loss.

Via

Access Paper or Ask Questions

Textual Features for Programming by Example

Sep 17, 2012
Aditya Krishna Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, Adam Tauman Kalai

Figure 1 for Textual Features for Programming by Example

Figure 2 for Textual Features for Programming by Example

Figure 3 for Textual Features for Programming by Example

Figure 4 for Textual Features for Programming by Example

In Programming by Example, a system attempts to infer a program from input and output examples, generally by searching for a composition of certain base functions. Performing a naive brute force search is infeasible for even mildly involved tasks. We note that the examples themselves often present clues as to which functions to compose, and how to rank the resulting programs. In text processing, which is our domain of interest, clues arise from simple textual features: for example, if parts of the input and output strings are permutations of one another, this suggests that sorting may be useful. We describe a system that learns the reliability of such clues, allowing for faster search and a principled ranking over programs. Experiments on a prototype of this system show that this learning scheme facilitates efficient inference on a range of text processing tasks.

Via

Access Paper or Ask Questions

Dyadic Prediction Using a Latent Feature Log-Linear Model

Jun 10, 2010
Aditya Krishna Menon, Charles Elkan

Figure 1 for Dyadic Prediction Using a Latent Feature Log-Linear Model

Figure 2 for Dyadic Prediction Using a Latent Feature Log-Linear Model

Figure 3 for Dyadic Prediction Using a Latent Feature Log-Linear Model

Figure 4 for Dyadic Prediction Using a Latent Feature Log-Linear Model

In dyadic prediction, labels must be predicted for pairs (dyads) whose members possess unique identifiers and, sometimes, additional features called side-information. Special cases of this problem include collaborative filtering and link prediction. We present the first model for dyadic prediction that satisfies several important desiderata: (i) labels may be ordinal or nominal, (ii) side-information can be easily exploited if present, (iii) with or without side-information, latent features are inferred for dyad members, (iv) it is resistant to sample-selection bias, (v) it can learn well-calibrated probabilities, and (vi) it can scale to very large datasets. To our knowledge, no existing method satisfies all the above criteria. In particular, many methods assume that the labels are ordinal and ignore side-information when it is present. Experimental results show that the new method is competitive with state-of-the-art methods for the special cases of collaborative filtering and link prediction, and that it makes accurate predictions on nominal data.

Via

Access Paper or Ask Questions