Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ali H. Sayed

Logical Team Q-learning: An approach towards factored policies in cooperative MARL

Jun 05, 2020

Lucas Cassano, Ali H. Sayed

Figure 1 for Logical Team Q-learning: An approach towards factored policies in cooperative MARL

Figure 2 for Logical Team Q-learning: An approach towards factored policies in cooperative MARL

Figure 3 for Logical Team Q-learning: An approach towards factored policies in cooperative MARL

Figure 4 for Logical Team Q-learning: An approach towards factored policies in cooperative MARL

Abstract:We address the challenge of learning factored policies in cooperative MARL scenarios. In particular, we consider the situation in which a team of agents collaborates to optimize a common cost. Our goal is to obtain factored policies that determine the individual behavior of each agent so that the resulting joint policy is optimal. In this work we make contributions to both the dynamic programming and reinforcement learning settings. In the dynamic programming case we provide a number of lemmas that prove the existence of such factored policies and we introduce an algorithm (along with proof of convergence) that provably leads to them. Then we introduce tabular and deep versions of Logical Team Q-learning, which is a stochastic version of the algorithm for the RL case. We conclude the paper by providing experiments that illustrate the claims.

Via

Access Paper or Ask Questions

Tracking Performance of Online Stochastic Learners

Apr 04, 2020

Stefan Vlaski, Elsa Rizk, Ali H. Sayed

Figure 1 for Tracking Performance of Online Stochastic Learners

Figure 2 for Tracking Performance of Online Stochastic Learners

Abstract:The utilization of online stochastic algorithms is popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches. When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy. Building on analogies with the study of adaptive filters, we establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models. The link allows us to infer the tracking performance from steady-state expressions directly and almost by inspection.

Via

Access Paper or Ask Questions

Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Mar 31, 2020

Stefan Vlaski, Ali H. Sayed

Figure 1 for Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Figure 2 for Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Figure 3 for Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Figure 4 for Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Abstract:Rapid advances in data collection and processing capabilities have allowed for the use of increasingly complex models that give rise to nonconvex optimization problems. These formulations, however, can be arbitrarily difficult to solve in general, in the sense that even simply verifying that a given point is a local minimum can be NP-hard [1]. Still, some relatively simple algorithms have been shown to lead to surprisingly good empirical results in many contexts of interest. Perhaps the most prominent example is the success of the backpropagation algorithm for training neural networks. Several recent works have pursued rigorous analytical justification for this phenomenon by studying the structure of the nonconvex optimization problems and establishing that simple algorithms, such as gradient descent and its variations, perform well in converging towards local minima and avoiding saddle-points. A key insight in these analyses is that gradient perturbations play a critical role in allowing local descent algorithms to efficiently distinguish desirable from undesirable stationary points and escape from the latter. In this article, we cover recent results on second-order guarantees for stochastic first-order optimization algorithms in centralized, federated, and decentralized architectures.

Via

Access Paper or Ask Questions

Dynamic Federated Learning

Feb 20, 2020

Elsa Rizk, Stefan Vlaski, Ali H. Sayed

Abstract:Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. While many federated learning architectures process data in an online manner, and are hence adaptive by nature, most performance analyses assume static optimization problems and offer no guarantees in the presence of drifts in the problem solution or data characteristics. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm. The results clarify the trade-off between convergence and tracking performance.

Via

Access Paper or Ask Questions

Multitask learning over graphs

Jan 07, 2020

Roula Nassif, Stefan Vlaski, Cedric Richard, Jie Chen, Ali H. Sayed

Figure 1 for Multitask learning over graphs

Figure 2 for Multitask learning over graphs

Figure 3 for Multitask learning over graphs

Figure 4 for Multitask learning over graphs

Abstract:The problem of learning simultaneously several related tasks has received considerable attention in several domains, especially in machine learning with the so-called multitask learning problem or learning to learn problem [1], [2]. Multitask learning is an approach to inductive transfer learning (using what is learned for one problem to assist in another problem) and helps improve generalization performance relative to learning each task separately by using the domain information contained in the training signals of related tasks as an inductive bias. Several strategies have been derived within this community under the assumption that all data are available beforehand at a fusion center. However, recent years have witnessed an increasing ability to collect data in a distributed and streaming manner. This requires the design of new strategies for learning jointly multiple tasks from streaming data over distributed (or networked) systems. This article provides an overview of multitask strategies for learning and adaptation over networks. The working hypothesis for these strategies is that agents are allowed to cooperate with each other in order to learn distinct, though related tasks. The article shows how cooperation steers the network limiting point and how different cooperation rules allow to promote different task relatedness models. It also explains how and when cooperation over multitask networks outperforms non-cooperative strategies.

Via

Access Paper or Ask Questions

Inverse Graph Learning over Optimization Networks

Dec 18, 2019

Vincenzo Matta, Augusto Santos, Ali H. Sayed

Figure 1 for Inverse Graph Learning over Optimization Networks

Figure 2 for Inverse Graph Learning over Optimization Networks

Figure 3 for Inverse Graph Learning over Optimization Networks

Figure 4 for Inverse Graph Learning over Optimization Networks

Abstract:Many inferential and learning tasks can be accomplished efficiently by means of distributed optimization algorithms where the network topology plays a critical role in driving the local interactions among neighboring agents. There is a large body of literature examining the effect of the graph structure on the performance of optimization strategies. In this article, we examine the inverse problem and consider the reverse question: How much information does observing the behavior at the nodes convey about the underlying network structure used for optimization? Over large-scale networks, the difficulty of addressing such inverse questions (or problems) is compounded by the fact that usually only a limited portion of nodes can be probed, giving rise to a second important question: Despite the presence of several unobserved nodes, are partial and local observations still sufficient to discover the graph linking the probed nodes? The article surveys recent advances on this inverse learning problem and related questions. Examples of applications are provided to illustrate how the interplay between graph learning and distributed optimization arises in practice, e.g., in cognitive engineered systems such as distributed detection, or in other real-world problems such as the mechanism of opinion formation over social networks and the mechanism of coordination in biological networks. A unifying framework for examining the reconstruction error will be described, which allows to devise and examine various estimation strategies enabling successful graph learning. The relevance of specific network attributes, such as sparsity versus density of connections, and node degree concentration, is discussed in relation to the topology inference goal. It is shown how universal (i.e., data-driven) clustering algorithms can be exploited to solve the graph learning problem.

* submitted for publication

Via

Access Paper or Ask Questions

Network Classifiers With Output Smoothing

Oct 30, 2019

Elsa Rizk, Roula Nassif, Ali H. Sayed

Figure 1 for Network Classifiers With Output Smoothing

Figure 2 for Network Classifiers With Output Smoothing

Figure 3 for Network Classifiers With Output Smoothing

Figure 4 for Network Classifiers With Output Smoothing

Abstract:This work introduces two strategies for training network classifiers with heterogeneous agents. One strategy promotes global smoothing over the graph and a second strategy promotes local smoothing over neighbourhoods. It is assumed that the feature sizes can vary from one agent to another, with some agents observing insufficient attributes to be able to make reliable decisions on their own. As a result, cooperation with neighbours is necessary. However, due to the fact that the feature dimensions are different across the agents, their classifier dimensions will also be different. This means that cooperation cannot rely on combining the classifier parameters. We instead propose smoothing the outputs of the classifiers, which are the predicted labels. By doing so, the dynamics that describes the evolution of the network classifier becomes more challenging than usual because the classifier parameters end up appearing as part of the regularization term as well. We illustrate performance by means of computer simulations.

Via

Access Paper or Ask Questions

Linear Speedup in Saddle-Point Escape for Decentralized Non-Convex Optimization

Oct 30, 2019

Stefan Vlaski, Ali H. Sayed

Figure 1 for Linear Speedup in Saddle-Point Escape for Decentralized Non-Convex Optimization

Abstract:Under appropriate cooperation protocols and parameter choices, fully decentralized solutions for stochastic optimization have been shown to match the performance of centralized solutions and result in linear speedup (in the number of agents) relative to non-cooperative approaches in the strongly-convex setting. More recently, these results have been extended to the pursuit of first-order stationary points in non-convex environments. In this work, we examine in detail the dependence of second-order convergence guarantees on the spectral properties of the combination policy for non-convex multi agent optimization. We establish linear speedup in saddle-point escape time in the number of agents for symmetric combination policies and study the potential for further improvement by employing asymmetric combination weights. The results imply that a linear speedup can be expected in the pursuit of second-order stationary points, which exclude local maxima as well as strict saddle-points and correspond to local or even global minima in many important learning settings.

* Submitted for publication

Via

Access Paper or Ask Questions

Regularized Diffusion Adaptation via Conjugate Smoothing

Sep 20, 2019

Stefan Vlaski, Lieven Vandenberghe, Ali H. Sayed

Figure 1 for Regularized Diffusion Adaptation via Conjugate Smoothing

Figure 2 for Regularized Diffusion Adaptation via Conjugate Smoothing

Figure 3 for Regularized Diffusion Adaptation via Conjugate Smoothing

Abstract:The purpose of this work is to develop and study a distributed strategy for Pareto optimization of an aggregate cost consisting of regularized risks. Each risk is modeled as the expectation of some loss function with unknown probability distribution while the regularizers are assumed deterministic, but are not required to be differentiable or even continuous. The individual, regularized, cost functions are distributed across a strongly-connected network of agents and the Pareto optimal solution is sought by appealing to a multi-agent diffusion strategy. To this end, the regularizers are smoothed by means of infimal convolution and it is shown that the Pareto solution of the approximate, smooth problem can be made arbitrarily close to the solution of the original, non-smooth problem. Performance bounds are established under conditions that are weaker than assumed before in the literature, and hence applicable to a broader class of adaptation and learning problems.

Via

Access Paper or Ask Questions

ISL: Optimal Policy Learning With Optimal Exploration-Exploitation Trade-Off

Sep 13, 2019

Lucas Cassano, Ali H. Sayed

Figure 1 for ISL: Optimal Policy Learning With Optimal Exploration-Exploitation Trade-Off

Figure 2 for ISL: Optimal Policy Learning With Optimal Exploration-Exploitation Trade-Off

Figure 3 for ISL: Optimal Policy Learning With Optimal Exploration-Exploitation Trade-Off

Abstract:Traditionally, off-policy learning algorithms (such as Q-learning) and exploration schemes have been derived separately. Often times, the exploration-exploitation dilemma being addressed through heuristics. In this article we show that both the learning equations and the exploration-exploitation strategy can be derived in tandem as the solution to a unique and well-posed optimization problem whose minimization leads to the optimal value function. We present a new algorithm following this idea. The algorithm is of the gradient type (and therefore has good convergence properties even when used in conjunction with function approximators such as neural networks); it is off-policy; and it specifies both the update equations and the strategy to address the exploration-exploitation dilemma. To the best of our knowledge, this is the first algorithm that has these properties.

Via

Access Paper or Ask Questions