Abstract:Adaptive networks have the capability to pursue solutions of global stochastic optimization problems by relying only on local interactions within neighborhoods. The diffusion of information through repeated interactions allows for globally optimal behavior, without the need for central coordination. Most existing strategies are developed for cooperative learning settings, where the objective of the network is common to all agents. We consider in this work a team setting, where a subset of the agents form a team with a common goal while competing with the remainder of the network. We develop an algorithm for decentralized competition among teams of adaptive agents, analyze its dynamics and present an application in the decentralized training of generative adversarial neural networks.
Abstract:This work focuses on the development of a new family of decision-making algorithms for adaptation and learning, which are specifically tailored to decision problems and are constructed by building up on first principles from decision theory. A key observation is that estimation and decision problems are structurally different and, therefore, algorithms that have proven successful for the former need not perform well when adjusted for decision problems. We propose a new scheme, referred to as BLLR (barrier log-likelihood ratio algorithm) and demonstrate its applicability to real-data from the COVID-19 pandemic in Italy. The results illustrate the ability of the design tool to track the different phases of the outbreak.
Abstract:Federated learning encapsulates distributed learning strategies that are managed by a central unit. Since it relies on using a selected number of agents at each iteration, and since each agent, in turn, taps into its local data, it is only natural to study optimal sampling policies for selecting agents and their data in federated learning implementations. Usually, only uniform sampling schemes are used. However, in this work, we examine the effect of importance sampling and devise schemes for sampling agents and data non-uniformly guided by a performance measure. We find that in schemes involving sampling without replacement, the performance of the resulting architecture is controlled by two factors related to data variability at each agent, and model variability across agents. We illustrate the theoretical findings with experiments on simulated and real data and show the improvement in performance that results from the proposed strategies.
Abstract:Federated learning is a useful framework for centralized learning from distributed data under practical considerations of heterogeneity, asynchrony, and privacy. Federated architectures are frequently deployed in deep learning settings, which generally give rise to non-convex optimization problems. Nevertheless, most existing analysis are either limited to convex loss functions, or only establish first-order stationarity, despite the fact that saddle-points, which are first-order stationary, are known to pose bottlenecks in deep learning. We draw on recent results on the second-order optimality of stochastic gradient algorithms in centralized and decentralized settings, and establish second-order guarantees for a class of federated learning algorithms.
Abstract:Federated learning involves a mixture of centralized and decentralized processing tasks, where a server regularly selects a sample of the agents and these in turn sample their local data to compute stochastic gradients for their learning updates. This process runs continually. The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling. We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm. We run experiments on a regression and classification problem to illustrate the theoretical results.
Abstract:This work proposes a new way of combining independently trained classifiers over space and time. Combination over space means that the outputs of spatially distributed classifiers are aggregated. Combination over time means that the classifiers respond to streaming data during testing and continue to improve their performance even during this phase. By doing so, the proposed architecture is able to improve prediction performance over time with unlabeled data. Inspired by social learning algorithms, which require prior knowledge of the observations distribution, we propose a Social Machine Learning (SML) paradigm that is able to exploit the imperfect models generated during the learning phase. We show that this strategy results in consistent learning with high probability, and it yields a robust structure against poorly trained classifiers. Simulations with an ensemble of feedforward neural networks are provided to illustrate the theoretical results.
Abstract:Decentralized algorithms for stochastic optimization and learning rely on the diffusion of information as a result of repeated local exchanges of intermediate estimates. Such structures are particularly appealing in situations where agents may be hesitant to share raw data due to privacy concerns. Nevertheless, in the absence of additional privacy-preserving mechanisms, the exchange of local estimates, which are generated based on private data can allow for the inference of the data itself. The most common mechanism for guaranteeing privacy is the addition of perturbations to local estimates before broadcasting. These perturbations are generally chosen independently at every agent, resulting in a significant performance loss. We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible (to first order in the step-size) to the network centroid, while preserving privacy guarantees. The analysis allows for general nonconvex loss functions, and is hence applicable to a large number of machine learning and signal processing problems, including deep learning.
Abstract:The objective of meta-learning is to exploit the knowledge obtained from observed tasks to improve adaptation to unseen tasks. As such, meta-learners are able to generalize better when they are trained with a larger number of observed tasks and with a larger amount of data per task. Given the amount of resources that are needed, it is generally difficult to expect the tasks, their respective data, and the necessary computational capacity to be available at a single central location. It is more natural to encounter situations where these resources are spread across several agents connected by some graph topology. The formalism of meta-learning is actually well-suited to this decentralized setting, where the learner would be able to benefit from information and computational power spread across the agents. Motivated by this observation, in this work, we propose a cooperative fully-decentralized multi-agent meta-learning algorithm, referred to as Diffusion-based MAML or Dif-MAML. Decentralized optimization algorithms are superior to centralized implementations in terms of scalability, avoidance of communication bottlenecks, and privacy guarantees. The work provides a detailed theoretical analysis to show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML objective even in non-convex environments. Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
Abstract:This work studies multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) function coupling all agents. This scenario arises in many machine learning and engineering applications, such as regression over distributed features and resource allocation. We reformulate this problem into an equivalent saddle-point problem, which is amenable to decentralized solutions. We then propose a proximal primal-dual algorithm and establish its linear convergence to the optimal solution when the local functions are strongly-convex. To our knowledge, this is the first linearly convergent decentralized algorithm for multi-agent sharing problems with a general convex (possibly non-smooth) coupling function.
Abstract:We address the challenge of learning factored policies in cooperative MARL scenarios. In particular, we consider the situation in which a team of agents collaborates to optimize a common cost. Our goal is to obtain factored policies that determine the individual behavior of each agent so that the resulting joint policy is optimal. In this work we make contributions to both the dynamic programming and reinforcement learning settings. In the dynamic programming case we provide a number of lemmas that prove the existence of such factored policies and we introduce an algorithm (along with proof of convergence) that provably leads to them. Then we introduce tabular and deep versions of Logical Team Q-learning, which is a stochastic version of the algorithm for the RL case. We conclude the paper by providing experiments that illustrate the claims.