Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ambuj Tewari

University of Texas

Online Multiclass Boosting

Feb 25, 2018

Young Hun Jung, Jack Goetz, Ambuj Tewari

Abstract:Recent work has extended the theoretical analysis of boosting algorithms to multiclass problems and to online settings. However, the multiclass extension is in the batch setting and the online extensions only consider binary classification. We fill this gap in the literature by defining, and justifying, a weak learning condition for online multiclass boosting. This condition leads to an optimal boosting algorithm that requires the minimal number of weak learners to achieve a certain accuracy. Additionally, we propose an adaptive algorithm which is near optimal and enjoys an excellent performance on real data due to its adaptive property.

* 28 pages, 2 figures

Via

Access Paper or Ask Questions

Online Learning via Differential Privacy

Feb 10, 2018

Jacob Abernethy, Chansoo Lee, Audra McMillan, Ambuj Tewari

Figure 1 for Online Learning via Differential Privacy

Abstract:We explore the use of tools from differential privacy in the design and analysis of online learning algorithms. We develop a simple and powerful analysis technique for Follow-The-Leader type algorithms under privacy-preserving perturbations. This leads to the minimax optimal algorithm for k-sparse online PCA and the best-known perturbation based algorithm for the dense online PCA. We also show that the differential privacy is the core notion of algorithm stability in various online learning problems.

Via

Access Paper or Ask Questions

Lasso Guarantees for Time Series Estimation Under Subgaussian Tails and $ β$-Mixing

Feb 05, 2018

Kam Chung Wong, Zifan Li, Ambuj Tewari

Figure 1 for Lasso Guarantees for Time Series Estimation Under Subgaussian Tails and $ β$-Mixing

Abstract:Many theoretical results on estimation of high dimensional time series require specifying an underlying data generating model (DGM). Instead, along the footsteps of~\cite{wong2017lasso}, this paper relies only on (strict) stationarity and $ \beta $-mixing condition to establish consistency of lasso when data comes from a $\beta$-mixing process with marginals having subgaussian tails. Because of the general assumptions, the data can come from DGMs different than standard time series models such as VAR or ARCH. When the true DGM is not VAR, the lasso estimates correspond to those of the best linear predictors using the past observations. We establish non-asymptotic inequalities for estimation and prediction errors of the lasso estimates. Together with~\cite{wong2017lasso}, we provide lasso guarantees that cover full spectrum of the parameters in specifications of $ \beta $-mixing subgaussian time series. Applications of these results potentially extend to non-Gaussian, non-Markovian and non-linear times series models as the examples we provide demonstrate. In order to prove our results, we derive a novel Hanson-Wright type concentration inequality for $\beta$-mixing subgaussian random vectors that may be of independent interest.

Via

Access Paper or Ask Questions

Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits

Jan 05, 2018

Zifan Li, Ambuj Tewari

Abstract:Recent work on follow the perturbed leader (FTPL) algorithms for the adversarial multi-armed bandit problem has highlighted the role of the hazard rate of the distribution generating the perturbations. Assuming that the hazard rate is bounded, it is possible to provide regret analyses for a variety of FTPL algorithms for the multi-armed bandit problem. This paper pushes the inquiry into regret bounds for FTPL algorithms beyond the bounded hazard rate condition. There are good reasons to do so: natural distributions such as the uniform and Gaussian violate the condition. We give regret bounds for both bounded support and unbounded support distributions without assuming the hazard rate condition. We also disprove a conjecture that the Gaussian distribution cannot lead to a low-regret algorithm. In fact, it turns out that it leads to near optimal regret, up to logarithmic factors. A key ingredient in our approach is the introduction of a new notion called the generalized hazard rate.

Via

Access Paper or Ask Questions

Markov Decision Processes with Continuous Side Information

Nov 15, 2017

Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari

Figure 1 for Markov Decision Processes with Continuous Side Information

Abstract:We consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP for that episode. Our setting is motivated by applications in healthcare where baseline measurements of a patient at the start of a treatment episode form the context that may provide information about how the patient might respond to treatment decisions. We propose algorithms for learning in such Contextual Markov Decision Processes (CMDPs) under an assumption that the unobserved MDP parameters vary smoothly with the observed context. We also give lower and upper PAC bounds under the smoothness assumption. Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs. For the linear setting, we give a PAC learning algorithm based on KWIK learning techniques.

Via

Access Paper or Ask Questions

An Actor-Critic Contextual Bandit Algorithm for Personalized Mobile Health Interventions

Jun 28, 2017

Huitian Lei, Ambuj Tewari, Susan A. Murphy

Figure 1 for An Actor-Critic Contextual Bandit Algorithm for Personalized Mobile Health Interventions

Figure 2 for An Actor-Critic Contextual Bandit Algorithm for Personalized Mobile Health Interventions

Figure 3 for An Actor-Critic Contextual Bandit Algorithm for Personalized Mobile Health Interventions

Figure 4 for An Actor-Critic Contextual Bandit Algorithm for Personalized Mobile Health Interventions

Abstract:Increasing technological sophistication and widespread use of smartphones and wearable devices provide opportunities for innovative and highly personalized health interventions. A Just-In-Time Adaptive Intervention (JITAI) uses real-time data collection and communication capabilities of modern mobile devices to deliver interventions in real-time that are adapted to the in-the-moment needs of the user. The lack of methodological guidance in constructing data-based JITAIs remains a hurdle in advancing JITAI research despite the increasing popularity of JITAIs among clinical scientists. In this article, we make a first attempt to bridge this methodological gap by formulating the task of tailoring interventions in real-time as a contextual bandit problem. Interpretability requirements in the domain of mobile health lead us to formulate the problem differently from existing formulations intended for web applications such as ad or news article placement. Under the assumption of linear reward function, we choose the reward function (the "critic") parameterization separately from a lower dimensional parameterization of stochastic policies (the "actor"). We provide an online actor-critic algorithm that guides the construction and refinement of a JITAI. Asymptotic properties of the actor-critic algorithm are developed and backed up by numerical experiments. Additional numerical experiments are conducted to test the robustness of the algorithm when idealized assumptions used in the analysis of contextual bandit algorithm are breached.

Via

Access Paper or Ask Questions

Sampled Fictitious Play is Hannan Consistent

Apr 11, 2017

Zifan Li, Ambuj Tewari

Abstract:Fictitious play is a simple and widely studied adaptive heuristic for playing repeated games. It is well known that fictitious play fails to be Hannan consistent. Several variants of fictitious play including regret matching, generalized regret matching and smooth fictitious play, are known to be Hannan consistent. In this note, we consider sampled fictitious play: at each round, the player samples past times and plays the best response to previous moves of other players at the sampled time points. We show that sampled fictitious play, using Bernoulli sampling, is Hannan consistent. Unlike several existing Hannan consistency proofs that rely on concentration of measure results, ours instead uses anti-concentration results from Littlewood-Offord theory.

Via

Access Paper or Ask Questions

On Lipschitz Continuity and Smoothness of Loss Functions in Learning to Rank

Sep 13, 2016

Ambuj Tewari, Sougata Chaudhuri

Abstract:In binary classification and regression problems, it is well understood that Lipschitz continuity and smoothness of the loss function play key roles in governing generalization error bounds for empirical risk minimization algorithms. In this paper, we show how these two properties affect generalization error bounds in the learning to rank problem. The learning to rank problem involves vector valued predictions and therefore the choice of the norm with respect to which Lipschitz continuity and smoothness are defined becomes crucial. Choosing the $\ell_\infty$ norm in our definition of Lipschitz continuity allows us to improve existing bounds. Furthermore, under smoothness assumptions, our choice enables us to prove rates that interpolate between $1/\sqrt{n}$ and $1/n$ rates. Application of our results to ListNet, a popular learning to rank method, gives state-of-the-art performance guarantees.

* This paper has been withdrawn as it was superseded by an ICML 2015 paper "Generalization error bounds for learning to rank: Does the length of document lists matter?" available as arXiv:1603.01860

Via

Access Paper or Ask Questions

Online Learning to Rank with Top-k Feedback

Aug 23, 2016

Sougata Chaudhuri, Ambuj Tewari

Figure 1 for Online Learning to Rank with Top-k Feedback

Figure 2 for Online Learning to Rank with Top-k Feedback

Figure 3 for Online Learning to Rank with Top-k Feedback

Figure 4 for Online Learning to Rank with Top-k Feedback

Abstract:We consider two settings of online learning to rank where feedback is restricted to top ranked items. The problem is cast as an online game between a learner and sequence of users, over $T$ rounds. In both settings, the learners objective is to present ranked list of items to the users. The learner's performance is judged on the entire ranked list and true relevances of the items. However, the learner receives highly restricted feedback at end of each round, in form of relevances of only the top $k$ ranked items, where $k \ll m$. The first setting is \emph{non-contextual}, where the list of items to be ranked is fixed. The second setting is \emph{contextual}, where lists of items vary, in form of traditional query-document lists. No stochastic assumption is made on the generation process of relevances of items and contexts. We provide efficient ranking strategies for both the settings. The strategies achieve $O(T^{2/3})$ regret, where regret is based on popular ranking measures in first setting and ranking surrogates in second setting. We also provide impossibility results for certain ranking measures and a certain class of surrogates, when feedback is restricted to the top ranked item, i.e. $k=1$. We empirically demonstrate the performance of our algorithms on simulated and real world datasets.

* Under review in JMLR

Via

Access Paper or Ask Questions

Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games

Aug 23, 2016

Sougata Chaudhuri, Ambuj Tewari

Abstract:Partial monitoring games are repeated games where the learner receives feedback that might be different from adversary's move or even the reward gained by the learner. Recently, a general model of combinatorial partial monitoring (CPM) games was proposed \cite{lincombinatorial2014}, where the learner's action space can be exponentially large and adversary samples its moves from a bounded, continuous space, according to a fixed distribution. The paper gave a confidence bound based algorithm (GCB) that achieves $O(T^{2/3}\log T)$ distribution independent and $O(\log T)$ distribution dependent regret bounds. The implementation of their algorithm depends on two separate offline oracles and the distribution dependent regret additionally requires existence of a unique optimal action for the learner. Adopting their CPM model, our first contribution is a Phased Exploration with Greedy Exploitation (PEGE) algorithmic framework for the problem. Different algorithms within the framework achieve $O(T^{2/3}\sqrt{\log T})$ distribution independent and $O(\log^2 T)$ distribution dependent regret respectively. Crucially, our framework needs only the simpler "argmax" oracle from GCB and the distribution dependent regret does not require existence of a unique optimal action. Our second contribution is another algorithm, PEGE2, which combines gap estimation with a PEGE algorithm, to achieve an $O(\log T)$ regret bound, matching the GCB guarantee but removing the dependence on size of the learner's action space. However, like GCB, PEGE2 requires access to both offline oracles and the existence of a unique optimal action. Finally, we discuss how our algorithm can be efficiently applied to a CPM problem of practical interest: namely, online ranking with feedback at the top.

* Appearing in NIPS 2016

Via

Access Paper or Ask Questions