Nanjing University of Science and Technology, Nanjing, China




Abstract:Currently, under supervised learning, a model pretrained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated the knowledge transfer learning. It has reached the status of consensus solution for task-aware model training in remote sensing domain (RSD). Unfortunately, due to different categories of imaging data and stiff challenges of data annotation, there is not a large enough and uniform remote sensing dataset to support large-scale pretraining in RSD. Moreover, pretraining models on large-scale nature scene datasets by supervised learning and then directly fine-tuning on diverse downstream tasks seems to be a crude method, which is easily affected by inevitable labeling noise, severe domain gaps and task-aware discrepancies. Thus, in this paper, considering the self-supervised pretraining and powerful vision transformer (ViT) architecture, a concise and effective knowledge transfer learning strategy called ConSecutive PreTraining (CSPT) is proposed based on the idea of not stopping pretraining in natural language processing (NLP), which can gradually bridge the domain gap and transfer knowledge from the nature scene domain to the RSD. The proposed CSPT also can release the huge potential of unlabeled data for task-aware model training. Finally, extensive experiments are carried out on twelve datasets in RSD involving three types of downstream tasks (e.g., scene classification, object detection and land cover classification) and two types of imaging data (e.g., optical and SAR). The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in RSD can outperform the previous method of supervised pretraining-then-fine-tuning and even surpass the state-of-the-art (SOTA) performance without any expensive labeling consumption and careful model design.




Abstract:Most prior convergence results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i.e., the per-sample gradients are uniformly bounded. This assumption is unrealistic in many problems, e.g., linear regression with Gaussian data. We relax uniform Lipschitzness by instead assuming that the per-sample gradients have \textit{sample-dependent} upper bounds, i.e., per-sample Lipschitz constants, which themselves may be unbounded. We derive new convergence results for DP-SGD on both convex and nonconvex functions when the per-sample Lipschitz constants have bounded moments. Furthermore, we provide principled guidance on choosing the clip norm in DP-SGD for convex settings satisfying our relaxed version of Lipschitzness, without making distributional assumptions on the Lipschitz constants. We verify the effectiveness of our recommendation via experiments on benchmarking datasets.
Abstract:We propose a general framework to design posterior sampling methods for model-based RL. We show that the proposed algorithms can be analyzed by reducing regret to Hellinger distance based conditional probability estimation. We further show that optimistic posterior sampling can control this Hellinger distance, when we measure model error via data likelihood. This technique allows us to design and analyze unified posterior sampling algorithms with state-of-the-art sample complexity guarantees for many model-based RL settings. We illustrate our general result in many special cases, demonstrating the versatility of our framework.



Abstract:Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm in federated learning. However, in practice, the simple FedAvg algorithm converges very well. This paper explains the seemingly unreasonable effectiveness of FedAvg that contradicts the previous theoretical predictions. We find that the key assumption of bounded gradient dissimilarity in previous theoretical analyses is too pessimistic to characterize data heterogeneity in practical applications. For a simple quadratic problem, we demonstrate there exist regimes where large gradient dissimilarity does not have any negative impact on the convergence of FedAvg. Motivated by this observation, we propose a new quantity, average drift at optimum, to measure the effects of data heterogeneity, and explicitly use it to present a new theoretical analysis of FedAvg. We show that the average drift at optimum is nearly zero across many real-world federated training tasks, whereas the gradient dissimilarity can be large. And our new analysis suggests FedAvg can have identical convergence rates in homogeneous and heterogeneous data settings, and hence, leads to better understanding of its empirical success.




Abstract:Overparameterized neural networks enjoy great representation power on complex data, and more importantly yield sufficiently smooth output, which is crucial to their generalization and robustness. Most existing function approximation theories suggest that with sufficiently many parameters, neural networks can well approximate certain classes of functions in terms of the function value. The neural network themselves, however, can be highly nonsmooth. To bridge this gap, we take convolutional residual networks (ConvResNets) as an example, and prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness. Moreover, we extend our theory to approximating functions supported on a low-dimensional manifold. Our theory partially justifies the benefits of using deep and wide networks in practice. Numerical experiments on adversarial robust image classification are provided to support our theory.




Abstract:An indoor layout sensing and localization system in 60GHz millimeter wave (mmWave) band, named mmReality, is elaborated in this paper. The mmReality system consists of one transmitter and one mobile receiver, each with a phased array and a single radio frequency (RF) chain. To reconstruct the room layout, the pilot signal is delivered from the transmitter to the receiver via different pairs of transmission and receiving beams, so that the signals at all antenna elements can be resolved. Then, the spatial smoothing and two-dimensional multiple signal classification (MUSIC) algorithm is applied to detect the angle-of-arrival (AoAs) and angle-of-departure (AoDs) of the rays from the transmitter to the receiver. Moreover, the technique of multi-carrier ranging is adopted to measure the distance of each propagation path. Synthesizing the above geometrical parameters, the location of receiver relative to the transmitter can be pinpointed, both line-of-sight (LoS) and non-line-of-sight (NLoS) paths can also be determined. Therefore, the room layout can be reconstructed by moving the receiver and repeating the above measurement in different locations of the room. At the end, we show that the reconstructed room layout can be utilized to locate a mobile device according to its AoA spectrum, even with single access point.
Abstract:This paper considers the generalization performance of differentially private convex learning. We demonstrate that the convergence analysis of Langevin algorithms can be used to obtain new generalization bounds with differential privacy guarantees for DP-SGD. More specifically, by using some recently obtained dimension-independent convergence results for stochastic Langevin algorithms with convex objective functions, we obtain $O(n^{-1/4})$ privacy guarantees for DP-SGD with the optimal excess generalization error of $\tilde{O}(n^{-1/2})$ for certain classes of overparameterized smooth convex optimization problems. This improves previous DP-SGD results for such problems that contain explicit dimension dependencies, so that the resulting generalization bounds become unsuitable for overparameterized models used in practical applications.



Abstract:Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-collected dataset without further interactions with the environment. While various algorithms have been proposed for offline RL in the previous literature, the minimax optimal performance has only been (nearly) achieved for tabular Markov decision processes (MDPs). In this paper, we focus on offline RL with linear function approximation and propose two new algorithms, SPEVI+ and SPMVI+, for single-agent MDPs and two-player zero-sum Markov games (MGs), respectively. The proposed algorithms feature carefully crafted data splitting mechanisms and novel variance-reduction pessimistic estimators. Theoretical analysis demonstrates that they are capable of matching the performance lower bounds up to logarithmic factors. As a byproduct, a new performance lower bound is established for MGs, which tightens the existing results. To the best of our knowledge, these are the first computationally efficient and nearly minimax optimal algorithms for offline single-agent MDPs and MGs with linear function approximation.


Abstract:We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. Based on the Swin transformer model, our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads. At the heart of our approach is a shared attention mechanism modeling the dependencies across the tasks. We evaluate our model on several multitask benchmarks, showing that our MulT framework outperforms both the state-of-the art multitask convolutional neural network models and all the respective single task transformer models. Our experiments further highlight the benefits of sharing attention across all the tasks, and demonstrate that our MulT model is robust and generalizes well to new domains. Our project website is at https://ivrl.github.io/MulT/.

Abstract:We study the linear contextual bandit problem in the presence of adversarial corruption, where the reward at each round is corrupted by an adversary, and the corruption level (i.e., the sum of corruption magnitudes over the horizon) is $C\geq 0$. The best-known algorithms in this setting are limited in that they either are computationally inefficient or require a strong assumption on the corruption, or their regret is at least $C$ times worse than the regret without corruption. In this paper, to overcome these limitations, we propose a new algorithm based on the principle of optimism in the face of uncertainty. At the core of our algorithm is a weighted ridge regression where the weight of each chosen action depends on its confidence up to some threshold. We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds. Thus, our algorithm is nearly optimal up to logarithmic factors for both cases. Notably, our algorithm achieves the near-optimal regret for both corrupted and uncorrupted cases ($C=0$) simultaneously.