Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gauri Joshi

Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning

Dec 14, 2020
Yae Jee Cho, Samarth Gupta, Gauri Joshi, Osman Yağan

Figure 1 for Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning

Figure 2 for Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning

Figure 3 for Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning

Due to communication constraints and intermittent client availability in federated learning, only a subset of clients can participate in each training round. While most prior works assume uniform and unbiased client selection, recent work on biased client selection has shown that selecting clients with higher local losses can improve error convergence speed. However, previously proposed biased selection strategies either require additional communication cost for evaluating the exact local loss or utilize stale local loss, which can even make the model diverge. In this paper, we present a bandit-based communication-efficient client selection strategy UCB-CS that achieves faster convergence with lower communication overhead. We also demonstrate how client selection can be used to improve fairness.

Via

Access Paper or Ask Questions

Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies

Oct 03, 2020
Yae Jee Cho, Jianyu Wang, Gauri Joshi

Figure 1 for Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies

Figure 2 for Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies

Figure 3 for Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies

Figure 4 for Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies

Federated learning is a distributed optimization paradigm that enables a large number of resource-limited client nodes to cooperatively train a model without data sharing. Several works have analyzed the convergence of federated learning by accounting of data heterogeneity, communication and computation limitations, and partial client participation. However, they assume unbiased client participation, where clients are selected at random or in proportion of their data sizes. In this paper, we present the first convergence analysis of federated optimization for biased client selection strategies, and quantify how the selection bias affects convergence speed. We reveal that biasing client selection towards clients with higher local loss achieves faster error convergence. Using this insight, we propose Power-of-Choice, a communication- and computation-efficient client selection framework that can flexibly span the trade-off between convergence speed and solution bias. Our experiments demonstrate that Power-of-Choice strategies converge up to 3 $\times$ faster and give $10$% higher test accuracy than the baseline random selection.

Via

Access Paper or Ask Questions

Probabilistic Neighbourhood Component Analysis: Sample Efficient Uncertainty Estimation in Deep Learning

Jul 18, 2020
Ankur Mallick, Chaitanya Dwivedi, Bhavya Kailkhura, Gauri Joshi, T. Yong-Jin Han

Figure 1 for Probabilistic Neighbourhood Component Analysis: Sample Efficient Uncertainty Estimation in Deep Learning

Figure 2 for Probabilistic Neighbourhood Component Analysis: Sample Efficient Uncertainty Estimation in Deep Learning

Figure 3 for Probabilistic Neighbourhood Component Analysis: Sample Efficient Uncertainty Estimation in Deep Learning

Figure 4 for Probabilistic Neighbourhood Component Analysis: Sample Efficient Uncertainty Estimation in Deep Learning

While Deep Neural Networks (DNNs) achieve state-of-the-art accuracy in various applications, they often fall short in accurately estimating their predictive uncertainty and, in turn, fail to recognize when these predictions may be wrong. Several uncertainty-aware models, such as Bayesian Neural Network (BNNs) and Deep Ensembles have been proposed in the literature for quantifying predictive uncertainty. However, research in this area has been largely confined to the big data regime. In this work, we show that the uncertainty estimation capability of state-of-the-art BNNs and Deep Ensemble models degrades significantly when the amount of training data is small. To address the issue of accurate uncertainty estimation in the small-data regime, we propose a probabilistic generalization of the popular sample-efficient non-parametric kNN approach. Our approach enables deep kNN classifier to accurately quantify underlying uncertainties in its prediction. We demonstrate the usefulness of the proposed approach by achieving superior uncertainty quantification as compared to state-of-the-art on a real-world application of COVID-19 diagnosis from chest X-Rays. Our code is available at https://github.com/ankurmallick/sample-efficient-uq

Via

Access Paper or Ask Questions

Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization

Jul 15, 2020
Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, H. Vincent Poor

Figure 1 for Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization

Figure 2 for Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization

Figure 3 for Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization

Figure 4 for Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization

In federated optimization, heterogeneity in the clients' local datasets and computation speeds results in large variations in the number of local updates performed by each client in each communication round. Naive weighted aggregation of such models causes objective inconsistency, that is, the global model converges to a stationary point of a mismatched objective function which can be arbitrarily different from the true objective. This paper provides a general framework to analyze the convergence of federated heterogeneous optimization algorithms. It subsumes previously proposed methods such as FedAvg and FedProx and provides the first principled understanding of the solution bias and the convergence slowdown due to objective inconsistency. Using insights from this analysis, we propose FedNova, a normalized averaging method that eliminates objective inconsistency while preserving fast error convergence.

Via

Access Paper or Ask Questions

Slow and Stale Gradients Can Win the Race

Mar 23, 2020
Sanghamitra Dutta, Jianyu Wang, Gauri Joshi

Figure 1 for Slow and Stale Gradients Can Win the Race

Figure 2 for Slow and Stale Gradients Can Win the Race

Figure 3 for Slow and Stale Gradients Can Win the Race

Figure 4 for Slow and Stale Gradients Can Win the Race

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error. In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime(wallclock time). The main novelty in our work is that our runtime analysis considers random straggling delays, which helps us design and compare distributed SGD algorithms that strike a balance between straggling and staleness. We also provide a new error convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions. Finally, based on our theoretical characterization of the error-runtime trade-off, we propose a method of gradually varying synchronicity in distributed SGD and demonstrate its performance on CIFAR10 dataset.

* Some of the results have appeared in AISTATS 2018. This is an extended version with additional results, in particular, an adaptive synchronicity strategy called AdaSync. arXiv admin note: substantial text overlap with arXiv:1803.01113

Via

Access Paper or Ask Questions

Machine Learning on Volatile Instances

Mar 12, 2020
Xiaoxi Zhang, Jianyu Wang, Gauri Joshi, Carlee Joe-Wong

Figure 1 for Machine Learning on Volatile Instances

Figure 2 for Machine Learning on Volatile Instances

Figure 3 for Machine Learning on Volatile Instances

Figure 4 for Machine Learning on Volatile Instances

Due to the massive size of the neural network models and training datasets used in machine learning today, it is imperative to distribute stochastic gradient descent (SGD) by splitting up tasks such as gradient evaluation across multiple worker nodes. However, running distributed SGD can be prohibitively expensive because it may require specialized computing resources such as GPUs for extended periods of time. We propose cost-effective strategies to exploit volatile cloud instances that are cheaper than standard instances, but may be interrupted by higher priority workloads. To the best of our knowledge, this work is the first to quantify how variations in the number of active worker nodes (as a result of preemption) affects SGD convergence and the time to train the model. By understanding these trade-offs between preemption probability of the instances, accuracy, and training time, we are able to derive practical strategies for configuring distributed SGD jobs on volatile instances such as Amazon EC2 spot instances and other preemptible cloud instances. Experimental results show that our strategies achieve good training performance at substantially lower cost.

Via

Access Paper or Ask Questions

Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

Feb 21, 2020
Jianyu Wang, Hao Liang, Gauri Joshi

Figure 1 for Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

Figure 2 for Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

Figure 3 for Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

Figure 4 for Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown greatly impedes the performance of distributed SGD algorithm, especially in a wireless system or sensor networks. In this paper, we propose an algorithmic approach named Overlap-Local-SGD (and its momentum variant) to overlap the communication and computation so as to speedup the distributed training procedure. The approach can help to mitigate the straggler effects as well. We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the synchronized anchor model rather than communicating with others. Experimental results of training a deep neural network on CIFAR-10 dataset demonstrate the effectiveness of Overlap-Local-SGD. We also provide a convergence guarantee for the proposed algorithm under non-convex objective functions.

* Accepted to ICASSP 2020

Via

Access Paper or Ask Questions

Advances and Open Problems in Federated Learning

Dec 10, 2019
Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, Sen Zhao

Figure 1 for Advances and Open Problems in Federated Learning

Figure 2 for Advances and Open Problems in Federated Learning

Figure 3 for Advances and Open Problems in Federated Learning

Figure 4 for Advances and Open Problems in Federated Learning

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

Via

Access Paper or Ask Questions

Multi-Armed Bandits with Correlated Arms

Dec 03, 2019
Samarth Gupta, Shreyas Chaudhari, Gauri Joshi, Osman Yağan

Figure 1 for Multi-Armed Bandits with Correlated Arms

Figure 2 for Multi-Armed Bandits with Correlated Arms

Figure 3 for Multi-Armed Bandits with Correlated Arms

Figure 4 for Multi-Armed Bandits with Correlated Arms

We consider a multi-armed bandit framework where the rewards obtained by pulling different arms are correlated. The correlation information is captured in terms of \textit{pseudo-rewards}, which are bounds on the rewards on the other arm given a reward realization and can capture many general correlation structures. We leverage these pseudo-rewards to design a novel approach that extends any classical bandit algorithm to the correlated multi-armed bandit setting studied in the framework. In each round, our proposed C-Bandit algorithm identifies some arms as empirically non-competitive, and avoids exploring them for that round. Through a unified regret analysis of the proposed C-Bandit algorithm, we show that C-UCB and C-TS (the correlated bandit versions of Upper-confidence-bound and Thompson sampling) pull certain arms called non-competitive arms, only O(1) times. As a result, we effectively reduce a $K$-armed bandit problem to a $C+1$-armed bandit problem, where $C$ is the number of competitive arms, as only $C$ sub-optimal arms are pulled O(log T) times. In many practical scenarios, $C$ can be zero due to which our proposed C-Bandit algorithms achieve bounded regret. In the special case where rewards are correlated through a latent random variable $X$, we give a regret lower bound that shows that bounded regret is possible only when $C = 0$. In addition to simulations, we validate the proposed algorithms via experiments on two real-world recommendation datasets, movielens and goodreads, and show that C-UCB and C-TS significantly outperform classical bandit algorithms.

* arXiv admin note: text overlap with arXiv:1808.05904 A special case of the model studied in this paper is presented in arXiv:1808.05904

Via

Access Paper or Ask Questions