Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tong Zhang

Nanjing University of Science and Technology, Nanjing, China

On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

Jun 09, 2022

Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, Tong Zhang

Figure 1 for On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

Figure 2 for On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

Figure 3 for On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

Abstract:Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm in federated learning. However, in practice, the simple FedAvg algorithm converges very well. This paper explains the seemingly unreasonable effectiveness of FedAvg that contradicts the previous theoretical predictions. We find that the key assumption of bounded gradient dissimilarity in previous theoretical analyses is too pessimistic to characterize data heterogeneity in practical applications. For a simple quadratic problem, we demonstrate there exist regimes where large gradient dissimilarity does not have any negative impact on the convergence of FedAvg. Motivated by this observation, we propose a new quantity, average drift at optimum, to measure the effects of data heterogeneity, and explicitly use it to present a new theoretical analysis of FedAvg. We show that the average drift at optimum is nearly zero across many real-world federated training tasks, whereas the gradient dissimilarity can be large. And our new analysis suggests FedAvg can have identical convergence rates in homogeneous and heterogeneous data settings, and hence, leads to better understanding of its empirical success.

Via

Access Paper or Ask Questions

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

Jun 09, 2022

Hao Liu, Minshuo Chen, Siawpeng Er, Wenjing Liao, Tong Zhang, Tuo Zhao

Figure 1 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

Figure 2 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

Figure 3 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

Figure 4 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

Abstract:Overparameterized neural networks enjoy great representation power on complex data, and more importantly yield sufficiently smooth output, which is crucial to their generalization and robustness. Most existing function approximation theories suggest that with sufficiently many parameters, neural networks can well approximate certain classes of functions in terms of the function value. The neural network themselves, however, can be highly nonsmooth. To bridge this gap, we take convolutional residual networks (ConvResNets) as an example, and prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness. Moreover, we extend our theory to approximating functions supported on a low-dimensional manifold. Our theory partially justifies the benefits of using deep and wide networks in practice. Numerical experiments on adversarial robust image classification are provided to support our theory.

Via

Access Paper or Ask Questions

An Indoor Environment Sensing and Localization System via mmWave Phased Array

Jun 07, 2022

Jie Li, Yifei Sun, Tong Zhang, Rui Wang

Figure 1 for An Indoor Environment Sensing and Localization System via mmWave Phased Array

Figure 2 for An Indoor Environment Sensing and Localization System via mmWave Phased Array

Figure 3 for An Indoor Environment Sensing and Localization System via mmWave Phased Array

Figure 4 for An Indoor Environment Sensing and Localization System via mmWave Phased Array

Abstract:An indoor layout sensing and localization system in 60GHz millimeter wave (mmWave) band, named mmReality, is elaborated in this paper. The mmReality system consists of one transmitter and one mobile receiver, each with a phased array and a single radio frequency (RF) chain. To reconstruct the room layout, the pilot signal is delivered from the transmitter to the receiver via different pairs of transmission and receiving beams, so that the signals at all antenna elements can be resolved. Then, the spatial smoothing and two-dimensional multiple signal classification (MUSIC) algorithm is applied to detect the angle-of-arrival (AoAs) and angle-of-departure (AoDs) of the rays from the transmitter to the receiver. Moreover, the technique of multi-carrier ranging is adopted to measure the distance of each propagation path. Synthesizing the above geometrical parameters, the location of receiver relative to the transmitter can be pinpointed, both line-of-sight (LoS) and non-line-of-sight (NLoS) paths can also be determined. Therefore, the room layout can be reconstructed by moving the receiver and repeating the above measurement in different locations of the room. At the end, we show that the reconstructed room layout can be utilized to locate a mobile device according to its AoA spectrum, even with single access point.

Via

Access Paper or Ask Questions

Dimension Independent Generalization of DP-SGD for Overparameterized Smooth Convex Optimization

Jun 03, 2022

Yi-An Ma, Teodor Vanislavov Marinov, Tong Zhang

Abstract:This paper considers the generalization performance of differentially private convex learning. We demonstrate that the convergence analysis of Langevin algorithms can be used to obtain new generalization bounds with differential privacy guarantees for DP-SGD. More specifically, by using some recently obtained dimension-independent convergence results for stochastic Langevin algorithms with convex objective functions, we obtain $O(n^{-1/4})$ privacy guarantees for DP-SGD with the optimal excess generalization error of $\tilde{O}(n^{-1/2})$ for certain classes of overparameterized smooth convex optimization problems. This improves previous DP-SGD results for such problems that contain explicit dimension dependencies, so that the resulting generalization bounds become unsuitable for overparameterized models used in practical applications.

Via

Access Paper or Ask Questions

Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game

May 31, 2022

Wei Xiong, Han Zhong, Chengshuai Shi, Cong Shen, Liwei Wang, Tong Zhang

Abstract:Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-collected dataset without further interactions with the environment. While various algorithms have been proposed for offline RL in the previous literature, the minimax optimal performance has only been (nearly) achieved for tabular Markov decision processes (MDPs). In this paper, we focus on offline RL with linear function approximation and propose two new algorithms, SPEVI+ and SPMVI+, for single-agent MDPs and two-player zero-sum Markov games (MGs), respectively. The proposed algorithms feature carefully crafted data splitting mechanisms and novel variance-reduction pessimistic estimators. Theoretical analysis demonstrates that they are capable of matching the performance lower bounds up to logarithmic factors. As a byproduct, a new performance lower bound is established for MGs, which tightens the existing results. To the best of our knowledge, these are the first computationally efficient and nearly minimax optimal algorithms for offline single-agent MDPs and MGs with linear function approximation.

Via

Access Paper or Ask Questions

MulT: An End-to-End Multitask Learning Transformer

May 17, 2022

Deblina Bhattacharjee, Tong Zhang, Sabine Süsstrunk, Mathieu Salzmann

Figure 1 for MulT: An End-to-End Multitask Learning Transformer

Figure 2 for MulT: An End-to-End Multitask Learning Transformer

Abstract:We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. Based on the Swin transformer model, our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads. At the heart of our approach is a shared attention mechanism modeling the dependencies across the tasks. We evaluate our model on several multitask benchmarks, showing that our MulT framework outperforms both the state-of-the art multitask convolutional neural network models and all the respective single task transformer models. Our experiments further highlight the benefits of sharing attention across all the tasks, and demonstrate that our MulT model is robust and generalizes well to new domains. Our project website is at https://ivrl.github.io/MulT/.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

May 13, 2022

Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

Figure 1 for Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

Abstract:We study the linear contextual bandit problem in the presence of adversarial corruption, where the reward at each round is corrupted by an adversary, and the corruption level (i.e., the sum of corruption magnitudes over the horizon) is $C\geq 0$. The best-known algorithms in this setting are limited in that they either are computationally inefficient or require a strong assumption on the corruption, or their regret is at least $C$ times worse than the regret without corruption. In this paper, to overcome these limitations, we propose a new algorithm based on the principle of optimism in the face of uncertainty. At the core of our algorithm is a weighted ridge regression where the weight of each chosen action depends on its confidence up to some threshold. We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds. Thus, our algorithm is nearly optimal up to logarithmic factors for both cases. Notably, our algorithm achieves the near-optimal regret for both corrupted and uncorrupted cases ($C=0$) simultaneously.

* 29 pages, 1 table

Via

Access Paper or Ask Questions

Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Apr 13, 2022

Tong Zhang, Congpei Qiu, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

Figure 1 for Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Figure 2 for Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Figure 3 for Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Figure 4 for Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Abstract:Self-supervised learning (SSL) methods aim to learn view-invariant representations by maximizing the similarity between the features extracted from different crops of the same image regardless of cropping size and content. In essence, this strategy ignores the fact that two crops may truly contain different image information, e.g., background and small objects, and thus tends to restrain the diversity of the learned representations. In this work, we address this issue by introducing a new self-supervised learning strategy, LoGo, that explicitly reasons about Local and Global crops. To achieve view invariance, LoGo encourages similarity between global crops from the same image, as well as between a global and a local crop. However, to correctly encode the fact that the content of smaller crops may differ entirely, LoGo promotes two local crops to have dissimilar representations, while being close to global crops. Our LoGo strategy can easily be applied to existing SSL methods. Our extensive experiments on a variety of datasets and using different self-supervised learning frameworks validate its superiority over existing approaches. Noticeably, we achieve better results than supervised models on transfer learning when using only 1/10 of the data.

* accepted in CVPR 2022

Via

Access Paper or Ask Questions

Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective

Apr 10, 2022

Hui Deng, Tong Zhang, Yuchao Dai, Jiawei Shi, Yiran Zhong, Hongdong Li

Figure 1 for Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective

Figure 2 for Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective

Figure 3 for Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective

Figure 4 for Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective

Abstract:Directly regressing the non-rigid shape and camera pose from the individual 2D frame is ill-suited to the Non-Rigid Structure-from-Motion (NRSfM) problem. This frame-by-frame 3D reconstruction pipeline overlooks the inherent spatial-temporal nature of NRSfM, i.e., reconstructing the whole 3D sequence from the input 2D sequence. In this paper, we propose to model deep NRSfM from a sequence-to-sequence translation perspective, where the input 2D frame sequence is taken as a whole to reconstruct the deforming 3D non-rigid shape sequence. First, we apply a shape-motion predictor to estimate the initial non-rigid shape and camera motion from a single frame. Then we propose a context modeling module to model camera motions and complex non-rigid shapes. To tackle the difficulty in enforcing the global structure constraint within the deep framework, we propose to impose the union-of-subspace structure by replacing the self-expressiveness layer with multi-head attention and delayed regularizers, which enables end-to-end batch-wise training. Experimental results across different datasets such as Human3.6M, CMU Mocap and InterHand prove the superiority of our framework. The code will be made publicly available

Via

Access Paper or Ask Questions

Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling

Mar 15, 2022

Alekh Agarwal, Tong Zhang

Figure 1 for Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling

Abstract:Provably sample-efficient Reinforcement Learning (RL) with rich observations and function approximation has witnessed tremendous recent progress, particularly when the underlying function approximators are linear. In this linear regime, computationally and statistically efficient methods exist where the potentially infinite state and action spaces can be captured through a known feature embedding, with the sample complexity scaling with the (intrinsic) dimension of these features. When the action space is finite, significantly more sophisticated results allow non-linear function approximation under appropriate structural constraints on the underlying RL problem, permitting for instance, the learning of good features instead of assuming access to them. In this work, we present the first result for non-linear function approximation which holds for general action spaces under a linear embeddability condition, which generalizes all linear and finite action settings. We design a novel optimistic posterior sampling strategy, TS^3 for such problems, and show worst case sample complexity guarantees that scale with a rank parameter of the RL problem, the linear embedding dimension introduced in this work and standard measures of the function class complexity.

Via

Access Paper or Ask Questions