Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

John Langford

Yahoo Labs

Contextual Bandit Algorithms with Supervised Learning Guarantees

Oct 27, 2011
Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, Robert E. Schapire

We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete with the best in a set of $N$ experts with probability $1-\delta$ while incurring regret at most $O(\sqrt{KT\ln(N/\delta)})$ over $T$ time steps. The new algorithm is tested empirically in a large-scale, real-world dataset. Second, we give a new algorithm called VE that competes with a possibly infinite set of policies of VC-dimension $d$ while incurring regret at most $O(\sqrt{T(d\ln(T) + \ln (1/\delta))})$ with probability $1-\delta$. These guarantees improve on those of all previous algorithms, whether in a stochastic or adversarial environment, and bring us closer to providing supervised learning type guarantees for the contextual bandit setting.

* 10 pages

Via

Access Paper or Ask Questions

Online Importance Weight Aware Updates

Jun 18, 2011
Nikos Karampatziakis, John Langford

Figure 1 for Online Importance Weight Aware Updates

Figure 2 for Online Importance Weight Aware Updates

Figure 3 for Online Importance Weight Aware Updates

Figure 4 for Online Importance Weight Aware Updates

An importance weight quantifies the relative importance of one example over another, coming up in applications of boosting, asymmetric classification costs, reductions, and active learning. The standard approach for dealing with importance weights in gradient descent is via multiplication of the gradient. We first demonstrate the problems of this approach when importance weights are large, and argue in favor of more sophisticated ways for dealing with them. We then develop an approach which enjoys an invariance property: that updating twice with importance weight $h$ is equivalent to updating once with importance weight $2h$. For many important losses this has a closed form update which satisfies standard regret guarantees when all examples have $h=1$. We also briefly discuss two other reasonable approaches for handling large importance weights. Empirically, these approaches yield substantially superior prediction with similar computational performance while reducing the sensitivity of the algorithm to the exact setting of the learning rate. We apply these to online active learning yielding an extraordinarily fast active learning algorithm that works even in the presence of adversarial noise.

Via

Access Paper or Ask Questions

Efficient Optimal Learning for Contextual Bandits

Jun 13, 2011
Miroslav Dudik, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, Tong Zhang

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time $\mathrm{polylog}(N)$, where $N$ is the number of classification rules among which the oracle might choose. This is exponentially faster than all previous algorithms that achieve optimal regret in this setting. Our formulation also enables us to create an algorithm with regret that is additive rather than multiplicative in feedback delay as in all previous work.

Via

Access Paper or Ask Questions

Doubly Robust Policy Evaluation and Learning

May 06, 2011
Miroslav Dudik, John Langford, Lihong Li

Figure 1 for Doubly Robust Policy Evaluation and Learning

Figure 2 for Doubly Robust Policy Evaluation and Learning

Figure 3 for Doubly Robust Policy Evaluation and Learning

Figure 4 for Doubly Robust Policy Evaluation and Learning

We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as contextual bandits, encompasses a wide variety of applications including health-care policy and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and received rewards. The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy. Previous approaches rely either on models of rewards or models of the past policy. The former are plagued by a large bias whereas the latter have a large variance. In this work, we leverage the strength and overcome the weaknesses of the two approaches by applying the doubly robust technique to the problems of policy evaluation and optimization. We prove that this approach yields accurate value estimates when we have either a good (but not necessarily consistent) model of rewards or a good (but not necessarily consistent) model of past policy. Extensive empirical comparison demonstrates that the doubly robust approach uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies. As such, we expect the doubly robust approach to become common practice.

* Published at ICML 2011, 8 pages, 6 figures

Via

Access Paper or Ask Questions

Parallel Online Learning

Mar 22, 2011
Daniel Hsu, Nikos Karampatziakis, John Langford, Alex Smola

In this work we study parallelization of online learning, a core primitive in machine learning. In a parallel environment all known approaches for parallel online learning lead to delayed updates, where the model is updated using out-of-date information. In the worst case, or when examples are temporally correlated, delay can have a very adverse effect on the learning algorithm. Here, we analyze and present preliminary empirical results on a set of learning architectures based on a feature sharding approach that present various tradeoffs between delay, degree of parallelism, representation power and empirical performance.

Via

Access Paper or Ask Questions

Learning from Logged Implicit Exploration Data

Jun 14, 2010
Alex Strehl, John Langford, Sham Kakade, Lihong Li

Figure 1 for Learning from Logged Implicit Exploration Data

We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned. The primary challenge in a variety of settings is that the exploration policy, in which "offline" data is logged, is not explicitly known. Prior solutions here require either control of the actions during the learning process, recorded random exploration, or actions chosen obliviously in a repeated manner. The techniques reported here lift these restrictions, allowing the learning of a policy for choosing actions given features from historical data where no randomization occurred or was logged. We empirically verify our solution on two reasonably sized sets of real-world data obtained from Yahoo!.

Via

Access Paper or Ask Questions

Agnostic Active Learning Without Constraints

Jun 14, 2010
Alina Beygelzimer, Daniel Hsu, John Langford, Tong Zhang

Figure 1 for Agnostic Active Learning Without Constraints

Figure 2 for Agnostic Active Learning Without Constraints

We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification.

Via

Access Paper or Ask Questions

Feature Hashing for Large Scale Multitask Learning

Feb 27, 2010
Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, Alex Smola

Figure 1 for Feature Hashing for Large Scale Multitask Learning

Figure 2 for Feature Hashing for Large Scale Multitask Learning

Figure 3 for Feature Hashing for Large Scale Multitask Learning

Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case -- multitask learning with hundreds of thousands of tasks.

* Fixed broken theorem

Via

Access Paper or Ask Questions

Error-Correcting Tournaments

Feb 03, 2010
Alina Beygelzimer, John Langford, Pradeep Ravikumar

Figure 1 for Error-Correcting Tournaments

We present a family of pairwise tournaments reducing $k$-class classification to binary classification. These reductions are provably robust against a constant fraction of binary errors. The results improve on the PECOC construction \cite{SECOC} with an exponential improvement in computation, from $O(k)$ to $O(\log_2 k)$, and the removal of a square root in the regret dependence, matching the best possible computation and regret up to a constant.

* Minor wording improvements

Via

Access Paper or Ask Questions

Slow Learners are Fast

Nov 03, 2009
John Langford, Alexander Smola, Martin Zinkevich

Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning.

* Extended version of conference paper - NIPS 2009

Via

Access Paper or Ask Questions