Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Avishek Ghosh

Distributed Newton Can Communicate Less and Resist Byzantine Workers

Jun 15, 2020

Avishek Ghosh, Raj Kumar Maity, Arya Mazumdar

Figure 1 for Distributed Newton Can Communicate Less and Resist Byzantine Workers

Figure 2 for Distributed Newton Can Communicate Less and Resist Byzantine Workers

Figure 3 for Distributed Newton Can Communicate Less and Resist Byzantine Workers

Abstract:We develop a distributed second order optimization algorithm that is communication-efficient as well as robust against Byzantine failures of the worker machines. We propose COMRADE (COMunication-efficient and Robust Approximate Distributed nEwton), an iterative second order algorithm, where the worker machines communicate only once per iteration with the center machine. This is in sharp contrast with the state-of-the-art distributed second order algorithms like GIANT [34] and DINGO[7], where the worker machines send (functions of) local gradient and Hessian sequentially; thus ending up communicating twice with the center machine per iteration. Moreover, we show that the worker machines can further compress the local information before sending it to the center. In addition, we employ a simple norm based thresholding rule to filter-out the Byzantine worker machines. We establish the linear-quadratic rate of convergence of COMRADE and establish that the communication savings and Byzantine resilience result in only a small statistical error rate for arbitrary convex loss functions. To the best of our knowledge, this is the first work that addresses the issue of Byzantine resilience in second order distributed optimization. Furthermore, we validate our theoretical results with extensive experiments on synthetic and benchmark LIBSVM [5] data-sets and demonstrate convergence guarantees.

Via

Access Paper or Ask Questions

Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

Jun 15, 2020

Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran

Figure 1 for Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

Abstract:We consider the problem of model selection for two popular stochastic linear bandit settings, and propose algorithms that adapts to the unknown problem complexity. In the first setting, we consider the $K$ armed mixture bandits, where the mean reward of arm $i \in [K]$, is $\mu_i+ \langle \alpha_{i,t},\theta^* \rangle $, with $\alpha_{i,t} \in \mathbb{R}^d$ being the known context vector and $\mu_i \in [-1,1]$ and $\theta^*$ are unknown parameters. We define $\|\theta^*\|$ as the problem complexity and consider a sequence of nested hypothesis classes, each positing a different upper bound on $\|\theta^*\|$. Exploiting this, we propose Adaptive Linear Bandit (ALB), a novel phase based algorithm that adapts to the true problem complexity, $\|\theta^*\|$. We show that ALB achieves regret scaling of $O(\|\theta^*\|\sqrt{T})$, where $\|\theta^*\|$ is apriori unknown. As a corollary, when $\theta^*=0$, ALB recovers the minimax regret for the simple bandit algorithm without such knowledge of $\theta^*$. ALB is the first algorithm that uses parameter norm as model section criteria for linear bandits. Prior state of art algorithms \cite{osom} achieve a regret of $O(L\sqrt{T})$, where $L$ is the upper bound on $\|\theta^*\|$, fed as an input to the problem. In the second setting, we consider the standard linear bandit problem (with possibly an infinite number of arms) where the sparsity of $\theta^*$, denoted by $d^* \leq d$, is unknown to the algorithm. Defining $d^*$ as the problem complexity, we show that ALB achieves $O(d^*\sqrt{T})$ regret, matching that of an oracle who knew the true sparsity level. This methodology is then extended to the case of finitely many arms and similar results are proven. This is the first algorithm that achieves such model selection guarantees. We further verify our results via synthetic and real-data experiments.

* 24 pages, 8 figures

Via

Access Paper or Ask Questions

An Efficient Framework for Clustered Federated Learning

Jun 07, 2020

Avishek Ghosh, Jichan Chung, Dong Yin, Kannan Ramchandran

Figure 1 for An Efficient Framework for Clustered Federated Learning

Figure 2 for An Efficient Framework for Clustered Federated Learning

Figure 3 for An Efficient Framework for Clustered Federated Learning

Abstract:We address the problem of Federated Learning (FL) where users are distributed and partitioned into clusters. This setup captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task), they can leverage the strength in numbers in order to perform more efficient Federated Learning. We propose a new framework dubbed the Iterative Federated Clustering Algorithm (IFCA), which alternately estimates the cluster identities of the users and optimizes model parameters for the user clusters via gradient descent. We analyze the convergence rate of this algorithm first in a linear model with squared loss and then for generic strongly convex and smooth loss functions. We show that in both settings, with good initialization, IFCA converges at an exponential rate, and discuss the optimality of the statistical error rate. In the experiments, we show that our algorithm can succeed even if we relax the requirements on initialization with random initialization and multiple restarts. We also present experimental results showing that our algorithm is efficient in non-convex problems such as neural networks and outperforms the baselines on several clustered FL benchmarks created based on the MNIST and CIFAR-10 datasets by $5\sim 8\%$.

* 20 pages, 4 figures and 1 table

Via

Access Paper or Ask Questions

Alternating Minimization Converges Super-Linearly for Mixed Linear Regression

Apr 23, 2020

Avishek Ghosh, Kannan Ramchandran

Figure 1 for Alternating Minimization Converges Super-Linearly for Mixed Linear Regression

Figure 2 for Alternating Minimization Converges Super-Linearly for Mixed Linear Regression

Figure 3 for Alternating Minimization Converges Super-Linearly for Mixed Linear Regression

Figure 4 for Alternating Minimization Converges Super-Linearly for Mixed Linear Regression

Abstract:We address the problem of solving mixed random linear equations. We have unlabeled observations coming from multiple linear regressions, and each observation corresponds to exactly one of the regression models. The goal is to learn the linear regressors from the observations. Classically, Alternating Minimization (AM) (which is a variant of Expectation Maximization (EM)) is used to solve this problem. AM iteratively alternates between the estimation of labels and solving the regression problems with the estimated labels. Empirically, it is observed that, for a large variety of non-convex problems including mixed linear regression, AM converges at a much faster rate compared to gradient based algorithms. However, the existing theory suggests similar rate of convergence for AM and gradient based methods, failing to capture this empirical behavior. In this paper, we close this gap between theory and practice for the special case of a mixture of $2$ linear regressions. We show that, provided initialized properly, AM enjoys a \emph{super-linear} rate of convergence in certain parameter regimes. To the best of our knowledge, this is the first work that theoretically establishes such rate for AM. Hence, if we want to recover the unknown regressors upto an error (in $\ell_2$ norm) of $\epsilon$, AM only takes $\mathcal{O}(\log \log (1/\epsilon))$ iterations. Furthermore, we compare AM with a gradient based heuristic algorithm empirically and show that AM dominates in iteration complexity as well as wall-clock time.

* Accepted for publication at AISTATS, 2020

Via

Access Paper or Ask Questions

Communication-Efficient and Byzantine-Robust Distributed Learning

Nov 21, 2019

Avishek Ghosh, Raj Kumar Maity, Swanand Kadhe, Arya Mazumdar, Kannan Ramchandran

Figure 1 for Communication-Efficient and Byzantine-Robust Distributed Learning

Figure 2 for Communication-Efficient and Byzantine-Robust Distributed Learning

Figure 3 for Communication-Efficient and Byzantine-Robust Distributed Learning

Figure 4 for Communication-Efficient and Byzantine-Robust Distributed Learning

Abstract:We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of {\delta}-approximate compressors from [KRSJ19] that encompasses sign-based compressors and top-k sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in the regime when the compression factor {\delta} is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.

Via

Access Paper or Ask Questions

Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation

Jun 21, 2019

Avishek Ghosh, Ashwin Pananjady, Adityanand Guntuboyina, Kannan Ramchandran

Figure 1 for Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation

Figure 2 for Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation

Figure 3 for Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation

Figure 4 for Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation

Abstract:Max-affine regression refers to a model where the unknown regression function is modeled as a maximum of $k$ unknown affine functions for a fixed $k \geq 1$. This generalizes linear regression and (real) phase retrieval, and is closely related to convex regression. Working within a non-asymptotic framework, we study this problem in the high-dimensional setting assuming that $k$ is a fixed constant, and focus on estimation of the unknown coefficients of the affine functions underlying the model. We analyze a natural alternating minimization (AM) algorithm for the non-convex least squares objective when the design is random. We show that the AM algorithm, when initialized suitably, converges with high probability and at a geometric rate to a small ball around the optimal coefficients. In order to initialize the algorithm, we propose and analyze a combination of a spectral method and a random search scheme in a low-dimensional space, which may be of independent interest. The final rate that we obtain is near-parametric and minimax optimal (up to a poly-logarithmic factor) as a function of the dimension, sample size, and noise variance. In that sense, our approach should be viewed as a direct and implementable method of enforcing regularization to alleviate the curse of dimensionality in problems of the convex regression type. As a by-product of our analysis, we also obtain guarantees on a classical algorithm for the phase retrieval problem under considerably weaker assumptions on the design distribution than was previously known. Numerical experiments illustrate the sharpness of our bounds in the various problem parameters.

* The first two authors contributed equally to this work and are ordered alphabetically

Via

Access Paper or Ask Questions

Robust Federated Learning in a Heterogeneous Environment

Jun 16, 2019

Avishek Ghosh, Justin Hong, Dong Yin, Kannan Ramchandran

Figure 1 for Robust Federated Learning in a Heterogeneous Environment

Figure 2 for Robust Federated Learning in a Heterogeneous Environment

Abstract:We study a recently proposed large-scale distributed learning paradigm, namely Federated Learning, where the worker machines are end users' own devices. Statistical and computational challenges arise in Federated Learning particularly in the presence of heterogeneous data distribution (i.e., data points on different devices belong to different distributions signifying different clusters) and Byzantine machines (i.e., machines that may behave abnormally, or even exhibit arbitrary and potentially adversarial behavior). To address the aforementioned challenges, first we propose a general statistical model for this problem which takes both the cluster structure of the users and the Byzantine machines into account. Then, leveraging the statistical model, we solve the robust heterogeneous Federated Learning problem \emph{optimally}; in particular our algorithm matches the lower bound on the estimation error in dimension and the number of data points. Furthermore, as a by-product, we prove statistical guarantees for an outlier-robust clustering algorithm, which can be considered as the Lloyd algorithm with robust estimation. Finally, we show via synthetic as well as real data experiments that the estimation error obtained by our proposed algorithm is significantly better than the non-Byzantine-robust algorithms; in particular, we gain at least by 53\% and 33\% for synthetic and real data experiments, respectively, in typical settings.

* 30 pages, 4 figures

Via

Access Paper or Ask Questions

Online Scoring with Delayed Information: A Convex Optimization Viewpoint

Jul 09, 2018

Avishek Ghosh, Kannan Ramchandran

Figure 1 for Online Scoring with Delayed Information: A Convex Optimization Viewpoint

Figure 2 for Online Scoring with Delayed Information: A Convex Optimization Viewpoint

Figure 3 for Online Scoring with Delayed Information: A Convex Optimization Viewpoint

Figure 4 for Online Scoring with Delayed Information: A Convex Optimization Viewpoint

Abstract:We consider a system where agents enter in an online fashion and are evaluated based on their attributes or context vectors. There can be practical situations where this context is partially observed, and the unobserved part comes after some delay. We assume that an agent, once left, cannot re-enter the system. Therefore, the job of the system is to provide an estimated score for the agent based on her instantaneous score and possibly some inference of the instantaneous score over the delayed score. In this paper, we estimate the delayed context via an online convex game between the agent and the system. We argue that the error in the score estimate accumulated over $T$ iterations is small if the regret of the online convex game is small. Further, we leverage side information about the delayed context in the form of a correlation function with the known context. We consider the settings where the delay is fixed or arbitrarily chosen by an adversary. Furthermore, we extend the formulation to the setting where the contexts are drawn from some Banach space. Overall, we show that the average penalty for not knowing the delayed context while making a decision scales with $\mathcal{O}(\frac{1}{\sqrt{T}})$, where this can be improved to $\mathcal{O}(\frac{\log T}{T})$ under special setting.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions

Misspecified Linear Bandits

Apr 23, 2017

Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

Figure 1 for Misspecified Linear Bandits

Figure 2 for Misspecified Linear Bandits

Figure 3 for Misspecified Linear Bandits

Figure 4 for Misspecified Linear Bandits

Abstract:We consider the problem of online learning in misspecified linear stochastic multi-armed bandit problems. Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features. It is, however, of interest to investigate the impact of potential misspecification in linear bandit models, where the expected rewards are perturbed away from the linear subspace determined by the arms features. Although OFUL has recently been shown to be robust to relatively small deviations from linearity, we show that any linear bandit algorithm that enjoys optimal regret performance in the perfectly linear setting (e.g., OFUL) must suffer linear regret under a sparse additive perturbation of the linear model. In an attempt to overcome this negative result, we define a natural class of bandit models characterized by a non-sparse deviation from linearity. We argue that the OFUL algorithm can fail to achieve sublinear regret even under models that have non-sparse deviation.We finally develop a novel bandit algorithm, comprising a hypothesis test for linearity followed by a decision to use either the OFUL or Upper Confidence Bound (UCB) algorithm. For perfectly linear bandit models, the algorithm provably exhibits OFULs favorable regret performance, while for misspecified models satisfying the non-sparse deviation property, the algorithm avoids the linear regret phenomenon and falls back on UCBs sublinear regret scaling. Numerical experiments on synthetic data, and on recommendation data from the public Yahoo! Learning to Rank Challenge dataset, empirically support our findings.

* Thirty-First AAAI Conference on Artificial Intelligence, 2017

Via

Access Paper or Ask Questions

An Evolutionary Approach to Drug-Design Using a Novel Neighbourhood Based Genetic Algorithm

May 03, 2012

Arnab Ghosh, Avishek Ghosh, Arkabandhu Chowdhury, Amit Konar

Figure 1 for An Evolutionary Approach to Drug-Design Using a Novel Neighbourhood Based Genetic Algorithm

Figure 2 for An Evolutionary Approach to Drug-Design Using a Novel Neighbourhood Based Genetic Algorithm

Figure 3 for An Evolutionary Approach to Drug-Design Using a Novel Neighbourhood Based Genetic Algorithm

Figure 4 for An Evolutionary Approach to Drug-Design Using a Novel Neighbourhood Based Genetic Algorithm

Abstract:The present work provides a new approach to evolve ligand structures which represent possible drug to be docked to the active site of the target protein. The structure is represented as a tree where each non-empty node represents a functional group. It is assumed that the active site configuration of the target protein is known with position of the essential residues. In this paper the interaction energy of the ligands with the protein target is minimized. Moreover, the size of the tree is difficult to obtain and it will be different for different active sites. To overcome the difficulty, a variable tree size configuration is used for designing ligands. The optimization is done using a novel Neighbourhood Based Genetic Algorithm (NBGA) which uses dynamic neighbourhood topology. To get variable tree size, a variable-length version of the above algorithm is devised. To judge the merit of the algorithm, it is initially applied on the well known Travelling Salesman Problem (TSP).

* 10 pages,13 figures (Communicated to journal)

Via

Access Paper or Ask Questions