Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tomer Koren

School of Computer Science, Tel Aviv University, Google Research, Tel Aviv

Optimal Rates for Random Order Online Optimization

Jun 29, 2021

Uri Sherman, Tomer Koren, Yishay Mansour

Abstract:We study online convex optimization in the random order model, recently proposed by \citet{garber2020online}, where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order. Focusing on the scenario where the cumulative loss function is (strongly) convex, yet individual loss functions are smooth but might be non-convex, we give algorithms that achieve the optimal bounds and significantly outperform the results of \citet{garber2020online}, completely removing the dimension dependence and improving their scaling with respect to the strong convexity parameter. Our analysis relies on novel connections between algorithmic stability and generalization for sampling without-replacement analogous to those studied in the with-replacement i.i.d.~setting, as well as on a refined average stability analysis of stochastic gradient descent.

Via

Access Paper or Ask Questions

Asynchronous Stochastic Optimization Robust to Arbitrary Delays

Jun 22, 2021

Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

Figure 1 for Asynchronous Stochastic Optimization Robust to Arbitrary Delays

Figure 2 for Asynchronous Stochastic Optimization Robust to Arbitrary Delays

Figure 3 for Asynchronous Stochastic Optimization Robust to Arbitrary Delays

Figure 4 for Asynchronous Stochastic Optimization Robust to Arbitrary Delays

Abstract:We consider stochastic optimization with delayed gradients where, at each time step $t$, the algorithm makes an update using a stale stochastic gradient from step $t - d_t$ for some arbitrary delay $d_t$. This setting abstracts asynchronous distributed optimization where a central server receives gradient updates computed by worker machines. These machines can experience computation and communication loads that might vary significantly over time. In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires $O( \sigma^2/\epsilon^4 + \tau/\epsilon^2 )$ steps for finding an $\epsilon$-stationary point $x$, where $\tau$ is the \emph{average} delay $\smash{\frac{1}{T}\sum_{t=1}^T d_t}$ and $\sigma^2$ is the variance of the stochastic gradients. This improves over previous work, which showed that stochastic gradient decent achieves the same rate but with respect to the \emph{maximal} delay $\max_{t} d_t$, that can be significantly larger than the average delay especially in heterogeneous distributed systems. Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.

Via

Access Paper or Ask Questions

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Jun 04, 2021

Tal Lancewicki, Shahar Segal, Tomer Koren, Yishay Mansour

Figure 1 for Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Figure 2 for Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Figure 3 for Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Figure 4 for Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Abstract:We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards, and the reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings, with an additional additive dependence on the quantiles of the delay distribution. Our results do not make any assumptions on the delay distributions: in particular, we do not assume they come from any parametric family of distributions and allow for unbounded support and expectation; we further allow for infinite delays where the algorithm might occasionally not observe any feedback.

* 33 pages, 5 figures, ICML 2021

Via

Access Paper or Ask Questions

Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry

Mar 02, 2021

Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

$Figure 1 for Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry$

$Figure 2 for Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry$

Abstract:Stochastic convex optimization over an $\ell_1$-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy. We show that, up to logarithmic factors the optimal excess population loss of any $(\varepsilon,\delta)$-differentially private optimizer is $\sqrt{\log(d)/n} + \sqrt{d}/\varepsilon n.$ The upper bound is based on a new algorithm that combines the iterative localization approach of~\citet{FeldmanKoTa20} with a new analysis of private regularized mirror descent. It applies to $\ell_p$ bounded domains for $p\in [1,2]$ and queries at most $n^{3/2}$ gradients improving over the best previously known algorithm for the $\ell_2$ case which needs $n^2$ gradients. Further, we show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $\sqrt{\log(d)/n} + (\log(d)/\varepsilon n)^{2/3}.$ This bound is achieved by a new variance-reduced version of the Frank-Wolfe algorithm that requires just a single pass over the data. We also show that the lower bound in this case is the minimum of the two rates mentioned above.

Via

Access Paper or Ask Questions

Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt{T}$ Regret

Feb 25, 2021

Asaf Cassel, Tomer Koren

Abstract:We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem. While model-free approaches are often favorable in practice, thus far only model-based methods, which rely on costly system identification, have been shown to achieve regret that scales with the optimal dependence on the time horizon T. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting.

Via

Access Paper or Ask Questions

Multiplicative Reweighting for Robust Neural Network Optimization

Feb 24, 2021

Noga Bar, Tomer Koren, Raja Giryes

Figure 1 for Multiplicative Reweighting for Robust Neural Network Optimization

Figure 2 for Multiplicative Reweighting for Robust Neural Network Optimization

Figure 3 for Multiplicative Reweighting for Robust Neural Network Optimization

Figure 4 for Multiplicative Reweighting for Robust Neural Network Optimization

Abstract:Deep neural networks are widespread due to their powerful performance. Yet, they suffer from degraded performance in the presence of noisy labels at train time or adversarial examples during inference. Inspired by the setting of learning with expert advice, where multiplicative weights (MW) updates were recently shown to be robust to moderate adversarial corruptions, we propose to use MW for reweighting examples during neural networks optimization. We establish the convergence of our method when used with gradient descent and demonstrate its advantage in two simple examples. We then validate empirically our findings by showing that MW improves network's accuracy in the presence of label noise on CIFAR-10, CIFAR-100 and Clothing1M, and that it leads to better robustness to adversarial attacks.

* Our code is publicly available in https://github.com/NogaBar/mr_robust_optim

Via

Access Paper or Ask Questions

Lazy OCO: Online Convex Optimization on a Switching Budget

Feb 07, 2021

Uri Sherman, Tomer Koren

Figure 1 for Lazy OCO: Online Convex Optimization on a Switching Budget

Figure 2 for Lazy OCO: Online Convex Optimization on a Switching Budget

Abstract:We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds. Similar problems have been addressed in prior work for the discrete decision set setting, and more recently in the continuous setting but only with an adaptive adversary. In this work, we aim to fill the gap and present computationally efficient algorithms in the more prevalent oblivious setting, establishing a regret bound of $O(T/S)$ for general convex losses and $\widetilde O(T/S^2)$ for strongly convex losses. In addition, for stochastic i.i.d.~losses, we present a simple algorithm that performs $\log T$ switches with only a multiplicative $\log T$ factor overhead in its regret in both the general and strongly convex settings. Finally, we complement our algorithms with lower bounds that match our upper bounds in some of the cases we consider.

Via

Access Paper or Ask Questions

The Instability of Accelerated Gradient Descent

Feb 03, 2021

Amit Attia, Tomer Koren

Figure 1 for The Instability of Accelerated Gradient Descent

Figure 2 for The Instability of Accelerated Gradient Descent

Figure 3 for The Instability of Accelerated Gradient Descent

Abstract:We study the algorithmic stability of Nesterov's accelerated gradient method. For convex quadratic objectives, \citet{chen2018stability} proved that the uniform stability of the method grows quadratically with the number of optimization steps, and conjectured that the same is true for the general convex and smooth case. We disprove this conjecture and show, for two notions of stability, that the stability of Nesterov's accelerated method in fact deteriorates \emph{exponentially fast} with the number of gradient steps. This stands in sharp contrast to the bounds in the quadratic case, but also to known results for non-accelerated gradient methods where stability typically grows linearly with the number of steps.

* 33 pages

Via

Access Paper or Ask Questions

SGD Generalizes Better Than GD (And Regularization Doesn't Help)

Feb 01, 2021

Idan Amir, Tomer Koren, Roi Livni

Abstract:We give a new separation result between the generalization performance of stochastic gradient descent (SGD) and of full-batch gradient descent (GD) in the fundamental stochastic convex optimization model. While for SGD it is well-known that $O(1/\epsilon^2)$ iterations suffice for obtaining a solution with $\epsilon$ excess expected risk, we show that with the same number of steps GD may overfit and emit a solution with $\Omega(1)$ generalization error. Moreover, we show that in fact $\Omega(1/\epsilon^4)$ iterations are necessary for GD to match the generalization performance of SGD, which is also tight due to recent work by Bassily et al. (2020). We further discuss how regularizing the empirical risk minimized by GD essentially does not change the above result, and revisit the concepts of stability, implicit bias and the role of the learning algorithm in generalization.

Via

Access Paper or Ask Questions

Online Markov Decision Processes with Aggregate Bandit Feedback

Jan 31, 2021

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Abstract:We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback: the trajectory is revealed along with the cumulative loss suffered, rather than the individual losses encountered along the trajectory. Our main result is a computationally efficient algorithm with $O(\sqrt{K})$ regret for this setting, where $K$ is the number of episodes. We establish this result via an efficient reduction to a novel bandit learning setting we call Distorted Linear Bandits (DLB), which is a variant of bandit linear optimization where actions chosen by the learner are adversarially distorted before they are committed. We then develop a computationally-efficient online algorithm for DLB for which we prove an $O(\sqrt{T})$ regret bound, where $T$ is the number of time steps. Our algorithm is based on online mirror descent with a self-concordant barrier regularization that employs a novel increasing learning rate schedule.

Via

Access Paper or Ask Questions