Get our free extension to see links to code for papers anywhere online!Free extension: code links for papers anywhere!Free add-on: See code for papers anywhere!

Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, Haoyang Zeng

We address the issue of limit cycling behavior in training Generative Adversarial Networks and propose the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs. Recent theoretical results have shown that optimistic mirror decent (OMD) can enjoy faster regret rates in the context of zero-sum games. WGANs is exactly a context of solving a zero-sum game with simultaneous no-regret dynamics. Moreover, we show that optimistic mirror decent addresses the limit cycling problem in training WGANs. We formally show that in the case of bi-linear zero-sum games the last iterate of OMD dynamics converges to an equilibrium, in contrast to GD dynamics which are bound to cycle. We also portray the huge qualitative difference between GD and OMD dynamics with toy examples, even when GD is modified with many adaptations proposed in the recent literature, such as gradient penalty or momentum. We apply OMD WGAN training to a bioinformatics problem of generating DNA sequences. We observe that models trained with OMD achieve consistently smaller KL divergence with respect to the true underlying distribution, than models trained with GD variants. Finally, we introduce a new algorithm, Optimistic Adam, which is an optimistic variant of Adam. We apply it to WGAN training on CIFAR10 and observe improved performance in terms of inception score as compared to Adam.

Via

Jonas Mueller, Vasilis Syrgkanis, Matt Taddy

We consider high dimensional dynamic multi-product pricing with an evolving but low-dimensional linear demand model. Assuming the temporal variation in cross-elasticities exhibits low-rank structure based on fixed (latent) features of the products, we show that the revenue maximization problem reduces to an online bandit convex optimization with side information given by the observed demands. We design dynamic pricing algorithms whose revenue approaches that of the best fixed price vector in hindsight, at a rate that only depends on the intrinsic rank of the demand model and not the number of products. Our approach applies a bandit convex optimization algorithm in a projected low-dimensional space spanned by the latent product features, while simultaneously learning this span via online singular value decomposition of a carefully-crafted matrix containing the observed demands.

Via

Robert Chen, Brendan Lucier, Yaron Singer, Vasilis Syrgkanis

We consider robust optimization problems, where the goal is to optimize in the worst case over a class of objective functions. We develop a reduction from robust improper optimization to Bayesian optimization: given an oracle that returns $\alpha$-approximate solutions for distributions over objectives, we compute a distribution over solutions that is $\alpha$-approximate in the worst case. We show that de-randomizing this solution is NP-hard in general, but can be done for a broad class of statistical learning tasks. We apply our results to robust neural network training and submodular optimization. We evaluate our approach experimentally on corrupted character classification, and robust influence maximization in networks.

Via

Vasilis Syrgkanis

We consider two stage estimation with a non-parametric first stage and a generalized method of moments second stage, in a simpler setting than (Chernozhukov et al. 2016). We give an alternative proof of the theorem given in (Chernozhukov et al. 2016) that orthogonal second stage moments, sample splitting and $n^{1/4}$-consistency of the first stage, imply $\sqrt{n}$-consistency and asymptotic normality of second stage estimates. Our proof is for a variant of their estimator, which is based on the empirical version of the moment condition (Z-estimator), rather than a minimization of a norm of the empirical vector of moments (M-estimator). This note is meant primarily for expository purposes, rather than as a new technical contribution.

Via

Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, Jennifer Wortman Vaughan

We consider the design of computationally efficient online learning algorithms in an adversarial setting in which the learner has access to an offline optimization oracle. We present an algorithm called Generalized Follow-the-Perturbed-Leader and provide conditions under which it is oracle-efficient while achieving vanishing regret. Our results make significant progress on an open problem raised by Hazan and Koren, who showed that oracle-efficient algorithms do not exist in general and asked whether one can identify properties under which oracle-efficient online learning may be possible. Our auction-design framework considers an auctioneer learning an optimal auction for a sequence of adversarially selected valuations with the goal of achieving revenue that is almost as good as the optimal auction in hindsight, among a class of auctions. We give oracle-efficient learning results for: (1) VCG auctions with bidder-specific reserves in single-parameter settings, (2) envy-free item pricing in multi-item auctions, and (3) s-level auctions of Morgenstern and Roughgarden for single-item settings. The last result leads to an approximation of the overall optimal Myerson auction when bidders' valuations are drawn according to a fast-mixing Markov process, extending prior work that only gave such guarantees for the i.i.d. setting. Finally, we derive various extensions, including: (1) oracle-efficient algorithms for the contextual learning setting in which the learner has access to side information (such as bidder demographics), (2) learning with approximate oracles such as those based on Maximal-in-Range algorithms, and (3) no-regret bidding in simultaneous auctions, resolving an open problem of Daskalakis and Syrgkanis.

Via

Vasilis Syrgkanis

We introduce a new sample complexity measure, which we refer to as split-sample growth rate. For any hypothesis $H$ and for any sample $S$ of size $m$, the split-sample growth rate $\hat{\tau}_H(m)$ counts how many different hypotheses can empirical risk minimization output on any sub-sample of $S$ of size $m/2$. We show that the expected generalization error is upper bounded by $O\left(\sqrt{\frac{\log(\hat{\tau}_H(2m))}{m}}\right)$. Our result is enabled by a strengthening of the Rademacher complexity analysis of the expected generalization error. We show that this sample complexity measure, greatly simplifies the analysis of the sample complexity of optimal auction design, for many auction classes studied in the literature. Their sample complexity can be derived solely by noticing that in these auction classes, ERM on any sample or sub-sample will pick parameters that are equal to one of the points in the sample.

Via

Tim Roughgarden, Vasilis Syrgkanis, Eva Tardos

This survey outlines a general and modular theory for proving approximation guarantees for equilibria of auctions in complex settings. This theory complements traditional economic techniques, which generally focus on exact and optimal solutions and are accordingly limited to relatively stylized settings. We highlight three user-friendly analytical tools: smoothness-type inequalities, which immediately yield approximation guarantees for many auction formats of interest in the special case of complete information and deterministic strategies; extension theorems, which extend such guarantees to randomized strategies, no-regret learning outcomes, and incomplete-information settings; and composition theorems, which extend such guarantees from simpler to more complex auctions. Combining these tools yields tight worst-case approximation guarantees for the equilibria of many widely-used auction formats.

Via

Vasilis Syrgkanis, Haipeng Luo, Akshay Krishnamurthy, Robert E. Schapire

We give an oracle-based algorithm for the adversarial contextual bandit problem, where either contexts are drawn i.i.d. or the sequence of contexts is known a priori, but where the losses are picked adversarially. Our algorithm is computationally efficient, assuming access to an offline optimization oracle, and enjoys a regret of order $O((KT)^{\frac{2}{3}}(\log N)^{\frac{1}{3}})$, where $K$ is the number of actions, $T$ is the number of iterations and $N$ is the number of baseline policies. Our result is the first to break the $O(T^{\frac{3}{4}})$ barrier that is achieved by recently introduced algorithms. Breaking this barrier was left as a major open problem. Our analysis is based on the recent relaxation based approach of (Rakhlin and Sridharan, 2016).

Via

Constantinos Daskalakis, Vasilis Syrgkanis

A line of recent work provides welfare guarantees of simple combinatorial auction formats, such as selling m items via simultaneous second price auctions (SiSPAs) (Christodoulou et al. 2008, Bhawalkar and Roughgarden 2011, Feldman et al. 2013). These guarantees hold even when the auctions are repeatedly executed and players use no-regret learning algorithms. Unfortunately, off-the-shelf no-regret algorithms for these auctions are computationally inefficient as the number of actions is exponential. We show that this obstacle is insurmountable: there are no polynomial-time no-regret algorithms for SiSPAs, unless RP$\supseteq$ NP, even when the bidders are unit-demand. Our lower bound raises the question of how good outcomes polynomially-bounded bidders may discover in such auctions. To answer this question, we propose a novel concept of learning in auctions, termed "no-envy learning." This notion is founded upon Walrasian equilibrium, and we show that it is both efficiently implementable and results in approximately optimal welfare, even when the bidders have fractionally subadditive (XOS) valuations (assuming demand oracles) or coverage valuations (without demand oracles). No-envy learning outcomes are a relaxation of no-regret outcomes, which maintain their approximate welfare optimality while endowing them with computational tractability. Our results extend to other auction formats that have been studied in the literature via the smoothness paradigm. Our results for XOS valuations are enabled by a novel Follow-The-Perturbed-Leader algorithm for settings where the number of experts is infinite, and the payoff function of the learner is non-linear. This algorithm has applications outside of auction settings, such as in security games. Our result for coverage valuations is based on a novel use of convex rounding schemes and a reduction to online convex optimization.

Via

Vasilis Syrgkanis, Akshay Krishnamurthy, Robert E. Schapire

We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of contexts such that any two policies behave differently in one of the contexts in the set. Our algorithms fall into the follow the perturbed leader family \cite{Kalai2005} and achieve regret $O(T^{3/4}\sqrt{K\log(N)})$ in the transductive setting and $O(T^{2/3} d^{3/4} K\sqrt{\log(N)})$ in the separator setting, where $K$ is the number of actions, $N$ is the number of baseline policies, and $d$ is the size of the separator. We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full information setting we address the even more general contextual combinatorial optimization. We provide several extensions and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.

Via