Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Furong Huang

$\texttt{FedBC}$: Calibrating Global and Local Models via Federated Learning Beyond Consensus

Jun 22, 2022
Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha

$Figure 1 for $\texttt{FedBC}$: Calibrating Global and Local Models via Federated Learning Beyond Consensus$

$Figure 2 for $\texttt{FedBC}$: Calibrating Global and Local Models via Federated Learning Beyond Consensus$

$Figure 3 for $\texttt{FedBC}$: Calibrating Global and Local Models via Federated Learning Beyond Consensus$

$Figure 4 for $\texttt{FedBC}$: Calibrating Global and Local Models via Federated Learning Beyond Consensus$

In federated learning (FL), the objective of collaboratively learning a global model through aggregation of model updates across devices tends to oppose the goal of personalization via local information. In this work, we calibrate this tradeoff in a quantitative manner through a multi-criterion optimization-based framework, which we cast as a constrained program: the objective for a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model. By considering the Lagrangian relaxation of this problem, we develop an algorithm that allows each node to minimize its local component of Lagrangian through queries to a first-order gradient oracle. Then, the server executes Lagrange multiplier ascent steps followed by a Lagrange multiplier-weighted averaging step. We call this instantiation of the primal-dual method Federated Learning Beyond Consensus ($\texttt{FedBC}$). Theoretically, we establish that $\texttt{FedBC}$ converges to a first-order stationary point at rates that matches the state of the art, up to an additional error term that depends on the tolerance parameter that arises due to the proximity constraints. Overall, the analysis is a novel characterization of primal-dual methods applied to non-convex saddle point problems with nonlinear constraints. Finally, we demonstrate that $\texttt{FedBC}$ balances the global and local model test accuracy metrics across a suite of datasets (Synthetic, MNIST, CIFAR-10, Shakespeare), achieving competitive performance with the state of the art.

Via

Access Paper or Ask Questions

Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems

Jun 21, 2022
Yanchao Sun, Ruijie Zheng, Parisa Hassanzadeh, Yongyuan Liang, Soheil Feizi, Sumitra Ganesh, Furong Huang

Figure 1 for Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems

Figure 2 for Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems

Figure 3 for Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems

Figure 4 for Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems

Communication is important in many multi-agent reinforcement learning (MARL) problems for agents to share information and make good decisions. However, when deploying trained communicative agents in a real-world application where noise and potential attackers exist, the safety of communication-based policies becomes a severe issue that is underexplored. Specifically, if communication messages are manipulated by malicious attackers, agents relying on untrustworthy communication may take unsafe actions that lead to catastrophic consequences. Therefore, it is crucial to ensure that agents will not be misled by corrupted communication, while still benefiting from benign communication. In this work, we consider an environment with $N$ agents, where the attacker may arbitrarily change the communication from any $C<\frac{N-1}{2}$ agents to a victim agent. For this strong threat model, we propose a certifiable defense by constructing a message-ensemble policy that aggregates multiple randomly ablated message sets. Theoretical analysis shows that this message-ensemble policy can utilize benign communication while being certifiably robust to adversarial communication, regardless of the attacking algorithm. Experiments in multiple environments verify that our defense significantly improves the robustness of trained policies against various types of attacks.

Via

Access Paper or Ask Questions

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

Jun 02, 2022
Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler, Furong Huang, Pratap Tokekar, Dinesh Manocha

Figure 1 for Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

Figure 2 for Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

Figure 3 for Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

Figure 4 for Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

In this work, we propose a novel ${\bf K}$ernelized ${\bf S}$tein Discrepancy-based Posterior Sampling for ${\bf RL}$ algorithm (named $\texttt{KSRL}$) which extends model-based RL based upon posterior sampling (PSRL) in several ways: we (i) relax the need for any smoothness or Gaussian assumptions, allowing for complex mixture models; (ii) ensure it is applicable to large-scale training by incorporating a compression step such that the posterior consists of a \emph{Bayesian coreset} of only statistically significant past state-action pairs; and (iii) develop a novel regret analysis of PSRL based upon integral probability metrics, which, under a smoothness condition on the constructed posterior, can be evaluated in closed form as the kernelized Stein discrepancy (KSD). Consequently, we are able to improve the $\mathcal{O}(H^{3/2}d\sqrt{T})$ {regret} of PSRL to $\mathcal{O}(H^{3/2}\sqrt{T})$, where $d$ is the input dimension, $H$ is the episode length, and $T$ is the total number of episodes experienced, alleviating a linear dependence on $d$ . Moreover, we theoretically establish a trade-off between regret rate with posterior representational complexity via introducing a compression budget parameter $\epsilon$ based on KSD, and establish a lower bound on the required complexity for consistency of the model. Experimentally, we observe that this approach is competitive with several state of the art RL methodologies, with substantive improvements in computation time. Experimentally, we observe that this approach is competitive with several state of the art RL methodologies, and can achieve up-to $50\%$ reduction in wall clock time in some continuous control environments.

Via

Access Paper or Ask Questions

End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Feb 15, 2022
Arpit Bansal, Avi Schwarzschild, Eitan Borgnia, Zeyad Emam, Furong Huang, Micah Goldblum, Tom Goldstein

Figure 1 for End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Figure 2 for End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Figure 3 for End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Figure 4 for End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is logical extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex algorithms that scale up to large/complex problems at test time. Logical extrapolation can be achieved through recurrent systems, which can be iterated many times to solve difficult reasoning problems. We observe that this approach fails to scale to highly complex problems because behavior degenerates when many iterations are applied -- an issue we refer to as "overthinking." We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to iteration number and instead pushes it to learn behaviors that can be repeated indefinitely. These innovations prevent the overthinking problem, and enable recurrent systems to solve extremely hard logical extrapolation tasks, some requiring over 100K convolutional layers, without overthinking.

Via

Access Paper or Ask Questions

Transfer RL across Observation Feature Spaces via Model-Based Regularization

Jan 01, 2022
Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew Cohen, Furong Huang

Figure 1 for Transfer RL across Observation Feature Spaces via Model-Based Regularization

Figure 2 for Transfer RL across Observation Feature Spaces via Model-Based Regularization

Figure 3 for Transfer RL across Observation Feature Spaces via Model-Based Regularization

Figure 4 for Transfer RL across Observation Feature Spaces via Model-Based Regularization

In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations, and may thus be subject to dramatic changes over time (e.g. increased number of observable features). However, when the observation space changes, the previous policy will likely fail due to the mismatch of input features, and another policy must be trained from scratch, which is inefficient in terms of computation and sample complexity. Following theoretical insights, we propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task to use as a model-based regularizer. Our algorithm works for drastic changes of observation space (e.g. from vector-based observation to image-based observation), without any inter-task mapping or any prior knowledge of the target task. Empirical results show that our algorithm significantly improves the efficiency and stability of learning in the target task.

Via

Access Paper or Ask Questions

Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Nov 10, 2021
Sicheng Zhu, Bang An, Furong Huang

Figure 1 for Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Figure 2 for Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Figure 3 for Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Figure 4 for Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Machine learning models that are developed to be invariant under certain types of data transformations have shown improved generalization in practice. However, a principled understanding of why invariance benefits generalization is limited. Given a dataset, there is often no principled way to select "suitable" data transformations under which model invariance guarantees better generalization. This paper studies the generalization benefit of model invariance by introducing the sample cover induced by transformations, i.e., a representative subset of a dataset that can approximately recover the whole dataset using transformations. For any data transformations, we provide refined generalization bounds for invariant models based on the sample cover. We also characterize the "suitability" of a set of data transformations by the sample covering number induced by transformations, i.e., the smallest size of its induced sample covers. We show that we may tighten the generalization bounds for "suitable" transformations that have a small sample covering number. In addition, our proposed sample covering number can be empirically evaluated and thus provides a guide for selecting transformations to develop model invariance for better generalization. In experiments on multiple datasets, we evaluate sample covering numbers for some commonly used transformations and show that the smaller sample covering number for a set of transformations (e.g., the 3D-view transformation) indicates a smaller gap between the test and training error for invariant models, which verifies our propositions.

* Accepted to NeurIPS 2021

Via

Access Paper or Ask Questions

VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

Oct 27, 2021
Mucong Ding, Kezhi Kong, Jingling Li, Chen Zhu, John P Dickerson, Furong Huang, Tom Goldstein

Figure 1 for VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

Figure 2 for VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

Figure 3 for VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

Figure 4 for VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the "neighbor explosion" problem by considering only a small subset of messages passed to the nodes in a mini-batch. However, sampling-based methods are difficult to apply to GNNs that utilize many-hops-away or global context each layer, show unstable performance for different tasks and datasets, and do not speed up model inference. We propose a principled and fundamentally different approach, VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. In contrast to sampling-based techniques, our approach can effectively preserve all the messages passed to a mini-batch of nodes by learning and updating a small number of quantized reference vectors of global node representations, using VQ within each GNN layer. Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix. We show that such a compact low-rank version of the gigantic convolution matrix is sufficient both theoretically and experimentally. In company with VQ, we design a novel approximated message passing algorithm and a nontrivial back-propagation rule for our framework. Experiments on various types of GNN backbones demonstrate the scalability and competitive performance of our framework on large-graph node classification and link prediction benchmarks.

* NeurIPS 2021

Via

Access Paper or Ask Questions

Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

Sep 17, 2021
Tahseen Rabbani, Brandon Feng, Yifan Yang, Arjun Rajkumar, Amitabh Varshney, Furong Huang

Figure 1 for Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

Figure 2 for Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

Figure 3 for Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

Figure 4 for Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

A popular application of federated learning is using many clients to train a deep neural network, the parameters of which are maintained on a central server. While recent efforts have focused on reducing communication complexity, existing algorithms assume that each participating client is able to download the current and full set of parameters, which may not be a practical assumption depending on the memory constraints of clients such as mobile devices. In this work, we propose a novel algorithm Comfetch, which allows clients to train large networks using compressed versions of the global architecture via Count Sketch, thereby reducing communication and local memory costs. We provide a theoretical convergence guarantee and experimentally demonstrate that it is possible to learn large networks, such as a deep convolutional network and an LSTM, through federated agents training on their sketched counterparts. The resulting global models exhibit competitive test accuracy when compared against the state-of-the-art FetchSGD and the classical FedAvg, both of which require clients to download the full architecture.

Via

Access Paper or Ask Questions

Practical and Fast Momentum-Based Power Methods

Aug 20, 2021
Tahseen Rabbani, Apollo Jain, Arjun Rajkumar, Furong Huang

Figure 1 for Practical and Fast Momentum-Based Power Methods

Figure 2 for Practical and Fast Momentum-Based Power Methods

Figure 3 for Practical and Fast Momentum-Based Power Methods

Figure 4 for Practical and Fast Momentum-Based Power Methods

The power method is a classical algorithm with broad applications in machine learning tasks, including streaming PCA, spectral clustering, and low-rank matrix approximation. The distilled purpose of the vanilla power method is to determine the largest eigenvalue (in absolute modulus) and its eigenvector of a matrix. A momentum-based scheme can be used to accelerate the power method, but achieving an optimal convergence rate with existing algorithms critically relies on additional spectral information that is unavailable at run-time, and sub-optimal initializations can result in divergence. In this paper, we provide a pair of novel momentum-based power methods, which we call the delayed momentum power method (DMPower) and a streaming variant, the delayed momentum streaming method (DMStream). Our methods leverage inexact deflation and are capable of achieving near-optimal convergence with far less restrictive hyperparameter requirements. We provide convergence analyses for both algorithms through the lens of perturbation theory. Further, we experimentally demonstrate that DMPower routinely outperforms the vanilla power method and that both algorithms match the convergence speed of an oracle running existing accelerated methods with perfect spectral knowledge.

Via

Access Paper or Ask Questions

Datasets for Studying Generalization from Easy to Hard Examples

Aug 13, 2021
Avi Schwarzschild, Eitan Borgnia, Arjun Gupta, Arpit Bansal, Zeyad Emam, Furong Huang, Micah Goldblum, Tom Goldstein

Figure 1 for Datasets for Studying Generalization from Easy to Hard Examples

Figure 2 for Datasets for Studying Generalization from Easy to Hard Examples

Figure 3 for Datasets for Studying Generalization from Easy to Hard Examples

We describe new datasets for studying generalization from easy to hard examples.

Via

Access Paper or Ask Questions