Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

H. Brendan McMahan

Practical Secure Aggregation for Federated Learning on User-Held Data

Nov 14, 2016
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, Karn Seth

Figure 1 for Practical Secure Aggregation for Federated Learning on User-Held Data

Secure Aggregation protocols allow a collection of mutually distrust parties, each holding a private value, to collaboratively compute the sum of those values without revealing the values themselves. We consider training a deep neural network in the Federated Learning model, using distributed stochastic gradient descent across user-held training data on mobile devices, wherein Secure Aggregation protects each user's model gradient. We design a novel, communication-efficient Secure Aggregation protocol for high-dimensional data that tolerates up to 1/3 users failing to complete the protocol. For 16-bit input values, our protocol offers 1.73x communication expansion for $2^{10}$ users and $2^{20}$-dimensional vectors, and 1.98x expansion for $2^{14}$ users and $2^{24}$ dimensional vectors.

* 5 pages, 1 figure. To appear at the NIPS 2016 workshop on Private Multi-Party Machine Learning

Via

Access Paper or Ask Questions

Deep Learning with Differential Privacy

Oct 24, 2016
Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang

Figure 1 for Deep Learning with Differential Privacy

Figure 2 for Deep Learning with Differential Privacy

Figure 3 for Deep Learning with Differential Privacy

Figure 4 for Deep Learning with Differential Privacy

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

Via

Access Paper or Ask Questions

Federated Optimization: Distributed Machine Learning for On-Device Intelligence

Oct 08, 2016
Jakub Konečný, H. Brendan McMahan, Daniel Ramage, Peter Richtárik

Figure 1 for Federated Optimization: Distributed Machine Learning for On-Device Intelligence

Figure 2 for Federated Optimization: Distributed Machine Learning for On-Device Intelligence

Figure 3 for Federated Optimization: Distributed Machine Learning for On-Device Intelligence

We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are unevenly distributed over an extremely large number of nodes. The goal is to train a high-quality centralized model. We refer to this setting as Federated Optimization. In this setting, communication efficiency is of the utmost importance and minimizing the number of rounds of communication is the principal goal. A motivating example arises when we keep the training data locally on users' mobile devices instead of logging it to a data center for training. In federated optimziation, the devices are used as compute nodes performing computation on their local data in order to update a global model. We suppose that we have extremely large number of devices in the network --- as many as the number of users of a given service, each of which has only a tiny fraction of the total data available. In particular, we expect the number of data points available locally to be much smaller than the number of devices. Additionally, since different users generate data with different patterns, it is reasonable to assume that no device has a representative sample of the overall distribution. We show that existing algorithms are not suitable for this setting, and propose a new algorithm which shows encouraging experimental results for sparse convex problems. This work also sets a path for future research needed in the context of \federated optimization.

* 38 pages

Via

Access Paper or Ask Questions

A Survey of Algorithms and Analysis for Adaptive Online Learning

Nov 09, 2015
H. Brendan McMahan

Figure 1 for A Survey of Algorithms and Analysis for Adaptive Online Learning

Figure 2 for A Survey of Algorithms and Analysis for Adaptive Online Learning

Figure 3 for A Survey of Algorithms and Analysis for Adaptive Online Learning

Figure 4 for A Survey of Algorithms and Analysis for Adaptive Online Learning

We present tools for the analysis of Follow-The-Regularized-Leader (FTRL), Dual Averaging, and Mirror Descent algorithms when the regularizer (equivalently, prox-function or learning rate schedule) is chosen adaptively based on the data. Adaptivity can be used to prove regret bounds that hold on every round, and also allows for data-dependent regret bounds as in AdaGrad-style algorithms (e.g., Online Gradient Descent with adaptive per-coordinate learning rates). We present results from a large number of prior works in a unified manner, using a modular and tight analysis that isolates the key arguments in easily re-usable lemmas. This approach strengthens pre-viously known FTRL analysis techniques to produce bounds as tight as those achieved by potential functions or primal-dual analysis. Further, we prove a general and exact equivalence between an arbitrary adaptive Mirror Descent algorithm and a correspond- ing FTRL update, which allows us to analyze any Mirror Descent algorithm in the same framework. The key to bridging the gap between Dual Averaging and Mirror Descent algorithms lies in an analysis of the FTRL-Proximal algorithm family. Our regret bounds are proved in the most general form, holding for arbitrary norms and non-smooth regularizers with time-varying weight.

Via

Access Paper or Ask Questions

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

May 21, 2014
H. Brendan McMahan, Francesco Orabona

Figure 1 for Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

We study algorithms for online linear optimization in Hilbert spaces, focusing on the case where the player is unconstrained. We develop a novel characterization of a large class of minimax algorithms, recovering, and even improving, several previous results as immediate corollaries. Moreover, using our tools, we develop an algorithm that provides a regret bound of $\mathcal{O}\Big(U \sqrt{T \log(U \sqrt{T} \log^2 T +1)}\Big)$, where $U$ is the $L_2$ norm of an arbitrary comparator and both $T$ and $U$ are unknown to the player. This bound is optimal up to $\sqrt{\log \log T}$ terms. When $T$ is known, we derive an algorithm with an optimal regret bound (up to constant factors). For both the known and unknown $T$ case, a Normal approximation to the conditional value of the game proves to be the key analysis tool.

* Proceedings of the 27th Annual Conference on Learning Theory (COLT 2014)

Via

Access Paper or Ask Questions

Large-Scale Learning with Less RAM via Randomization

Mar 19, 2013
Daniel Golovin, D. Sculley, H. Brendan McMahan, Michael Young

Figure 1 for Large-Scale Learning with Less RAM via Randomization

Figure 2 for Large-Scale Learning with Less RAM via Randomization

Figure 3 for Large-Scale Learning with Less RAM via Randomization

We reduce the memory footprint of popular large-scale online learning methods by projecting our weight vector onto a coarse discrete set using randomized rounding. Compared to standard 32-bit float encodings, this reduces RAM usage by more than 50% during training and by up to 95% when making predictions from a fixed model, with almost no loss in accuracy. We also show that randomized counting can be used to implement per-coordinate learning rates, improving model quality with little additional RAM. We prove these memory-saving methods achieve regret guarantees similar to their exact variants. Empirical evaluation confirms excellent performance, dominating standard approaches across memory versus accuracy tradeoffs.

* Extended version of ICML 2013 paper

Via

Access Paper or Ask Questions

Minimax Optimal Algorithms for Unconstrained Linear Optimization

Feb 08, 2013
H. Brendan McMahan

Figure 1 for Minimax Optimal Algorithms for Unconstrained Linear Optimization

We design and analyze minimax-optimal algorithms for online linear optimization games where the player's choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a post-hoc benchmark strategy. The standard benchmark is the loss of the best strategy chosen from a bounded comparator set. When the the comparison set and the adversary's gradients satisfy L_infinity bounds, we give the value of the game in closed form and prove it approaches sqrt(2T/pi) as T -> infinity. Interesting algorithms result when we consider soft constraints on the comparator, rather than restricting it to a bounded set. As a warmup, we analyze the game with a quadratic penalty. The value of this game is exactly T/2, and this value is achieved by perhaps the simplest online algorithm of all: unprojected gradient descent with a constant learning rate. We then derive a minimax-optimal algorithm for a much softer penalty function. This algorithm achieves good bounds under the standard notion of regret for any comparator point, without needing to specify the comparator set in advance. The value of this game converges to sqrt{e} as T ->infinity; we give a closed-form for the exact value as a function of T. The resulting algorithm is natural in unconstrained investment or betting scenarios, since it guarantees at worst constant loss, while allowing for exponential reward against an "easy" adversary.

Via

Access Paper or Ask Questions

On Calibrated Predictions for Auction Selection Mechanisms

Nov 16, 2012
H. Brendan McMahan, Omkar Muralidharan

Figure 1 for On Calibrated Predictions for Auction Selection Mechanisms

Figure 2 for On Calibrated Predictions for Auction Selection Mechanisms

Calibration is a basic property for prediction systems, and algorithms for achieving it are well-studied in both statistics and machine learning. In many applications, however, the predictions are used to make decisions that select which observations are made. This makes calibration difficult, as adjusting predictions to achieve calibration changes future data. We focus on click-through-rate (CTR) prediction for search ad auctions. Here, CTR predictions are used by an auction that determines which ads are shown, and we want to maximize the value generated by the auction. We show that certain natural notions of calibration can be impossible to achieve, depending on the details of the auction. We also show that it can be impossible to maximize auction efficiency while using calibrated predictions. Finally, we give conditions under which calibration is achievable and simultaneously maximizes auction efficiency: roughly speaking, bids and queries must not contain information about CTRs that is not already captured by the predictions.

Via

Access Paper or Ask Questions

No-Regret Algorithms for Unconstrained Online Convex Optimization

Nov 09, 2012
Matthew Streeter, H. Brendan McMahan

Figure 1 for No-Regret Algorithms for Unconstrained Online Convex Optimization

Some of the most compelling applications of online convex optimization, including online prediction and classification, are unconstrained: the natural feasible set is R^n. Existing algorithms fail to achieve sub-linear regret in this setting unless constraints on the comparator point x^* are known in advance. We present algorithms that, without such prior knowledge, offer near-optimal regret bounds with respect to any choice of x^*. In particular, regret with respect to x^* = 0 is constant. We then prove lower bounds showing that our guarantees are near-optimal in this setting.

* NIPS 2012
* To appear

Via

Access Paper or Ask Questions

A Unified View of Regularized Dual Averaging and Mirror Descent with Implicit Updates

Sep 20, 2011
H. Brendan McMahan

Figure 1 for A Unified View of Regularized Dual Averaging and Mirror Descent with Implicit Updates

Figure 2 for A Unified View of Regularized Dual Averaging and Mirror Descent with Implicit Updates

Figure 3 for A Unified View of Regularized Dual Averaging and Mirror Descent with Implicit Updates

We study three families of online convex optimization algorithms: follow-the-proximally-regularized-leader (FTRL-Proximal), regularized dual averaging (RDA), and composite-objective mirror descent. We first prove equivalence theorems that show all of these algorithms are instantiations of a general FTRL update. This provides theoretical insight on previous experimental observations. In particular, even though the FOBOS composite mirror descent algorithm handles L1 regularization explicitly, it has been observed that RDA is even more effective at producing sparsity. Our results demonstrate that FOBOS uses subgradient approximations to the L1 penalty from previous rounds, leading to less sparsity than RDA, which handles the cumulative penalty in closed form. The FTRL-Proximal algorithm can be seen as a hybrid of these two, and outperforms both on a large, real-world dataset. Our second contribution is a unified analysis which produces regret bounds that match (up to logarithmic terms) or improve the best previously known bounds. This analysis also extends these algorithms in two important ways: we support a more general type of composite objective and we analyze implicit updates, which replace the subgradient approximation of the current loss function with an exact optimization.

* Extensively updated version of earlier draft with new analysis including a general treatment of composite objectives and experiments. Also fixes a small bug in some of one of the proofs in the early version

Via

Access Paper or Ask Questions