Alert button
Picture for Junshan Zhang

Junshan Zhang

Alert button

Sherman

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

Jun 20, 2023
Hang Wang, Sen Lin, Junshan Zhang

Figure 1 for Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
Figure 2 for Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
Figure 3 for Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
Figure 4 for Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

The ensemble method is a promising way to mitigate the overestimation issue in Q-learning, where multiple function approximators are used to estimate the action values. It is known that the estimation bias hinges heavily on the ensemble size (i.e., the number of Q-function approximators used in the target), and that determining the `right' ensemble size is highly nontrivial, because of the time-varying nature of the function approximation errors during the learning process. To tackle this challenge, we first derive an upper bound and a lower bound on the estimation bias, based on which the ensemble size is adapted to drive the bias to be nearly zero, thereby coping with the impact of the time-varying approximation errors accordingly. Motivated by the theoretic findings, we advocate that the ensemble method can be combined with Model Identification Adaptive Control (MIAC) for effective ensemble size adaptation. Specifically, we devise Adaptive Ensemble Q-learning (AdaEQ), a generalized ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias. Extensive experiments are carried out to show that AdaEQ can improve the learning performance than the existing methods for the MuJoCo benchmark.

* NeurIPS 2021 
Viaarxiv icon

Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Jun 20, 2023
Hang Wang, Sen Lin, Junshan Zhang

Figure 1 for Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
Figure 2 for Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
Figure 3 for Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
Figure 4 for Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start RL can be improved \textit{quickly} in some cases but become \textit{stagnant} in other cases, especially when the function approximation is used. To this end, the primary objective of this work is to build a fundamental understanding on ``\textit{whether and when online learning can be significantly accelerated by a warm-start policy from offline RL?}''. Specifically, we consider the widely used Actor-Critic (A-C) method with a prior policy. We first quantify the approximation errors in the Actor update and the Critic update, respectively. Next, we cast the Warm-Start A-C algorithm as Newton's method with perturbation, and study the impact of the approximation errors on the finite-time learning performance with inaccurate Actor/Critic updates. Under some general technical conditions, we derive the upper bounds, which shed light on achieving the desired finite-learning performance in the Warm-Start A-C algorithm. In particular, our findings reveal that it is essential to reduce the algorithm bias in online learning. We also obtain lower bounds on the sub-optimality gap of the Warm-Start A-C algorithm to quantify the impact of the bias and error propagation.

* ICML 2023 Oral 
Viaarxiv icon

Efficient Self-supervised Continual Learning with Progressive Task-correlated Layer Freezing

Mar 13, 2023
Li Yang, Sen Lin, Fan Zhang, Junshan Zhang, Deliang Fan

Figure 1 for Efficient Self-supervised Continual Learning with Progressive Task-correlated Layer Freezing
Figure 2 for Efficient Self-supervised Continual Learning with Progressive Task-correlated Layer Freezing
Figure 3 for Efficient Self-supervised Continual Learning with Progressive Task-correlated Layer Freezing
Figure 4 for Efficient Self-supervised Continual Learning with Progressive Task-correlated Layer Freezing

Inspired by the success of Self-supervised learning (SSL) in learning visual representations from unlabeled data, a few recent works have studied SSL in the context of continual learning (CL), where multiple tasks are learned sequentially, giving rise to a new paradigm, namely self-supervised continual learning (SSCL). It has been shown that the SSCL outperforms supervised continual learning (SCL) as the learned representations are more informative and robust to catastrophic forgetting. However, if not designed intelligently, the training complexity of SSCL may be prohibitively high due to the inherent training cost of SSL. In this work, by investigating the task correlations in SSCL setup first, we discover an interesting phenomenon that, with the SSL-learned background model, the intermediate features are highly correlated between tasks. Based on this new finding, we propose a new SSCL method with layer-wise freezing which progressively freezes partial layers with the highest correlation ratios for each task to improve training computation efficiency and memory efficiency. Extensive experiments across multiple datasets are performed, where our proposed method shows superior performance against the SoTA SSCL methods under various SSL frameworks. For example, compared to LUMP, our method achieves 12\%/14\%/12\% GPU training time reduction, 23\%/26\%/24\% memory reduction, 35\%/34\%/33\% backward FLOPs reduction, and 1.31\%/1.98\%/1.21\% forgetting reduction without accuracy degradation on three datasets, respectively.

Viaarxiv icon

CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning

Feb 21, 2023
Sheng Yue, Guanbo Wang, Wei Shao, Zhaofeng Zhang, Sen Lin, Ju Ren, Junshan Zhang

Figure 1 for CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning
Figure 2 for CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning
Figure 3 for CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning
Figure 4 for CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning

This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL), namely the reward extrapolation error, where the learned reward function may fail to explain the task correctly and misguide the agent in unseen environments due to the intrinsic covariate shift. Leveraging both expert data and lower-quality diverse data, we devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function and utilizing an estimated dynamics model. Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy, based on which we characterize the impact of covariate shift by examining subtle two-tier tradeoffs between the exploitation (on both expert and diverse data) and exploration (on the estimated dynamics model). We show that CLARE can provably alleviate the reward extrapolation error by striking the right exploitation-exploration balance therein. Extensive experiments corroborate the significant performance gains of CLARE over existing state-of-the-art algorithms on MuJoCo continuous control tasks (especially with a small offline dataset), and the learned reward is highly instructive for further learning.

Viaarxiv icon

Algorithm Design for Online Meta-Learning with Task Boundary Detection

Feb 02, 2023
Daouda Sow, Sen Lin, Yingbin Liang, Junshan Zhang

Figure 1 for Algorithm Design for Online Meta-Learning with Task Boundary Detection
Figure 2 for Algorithm Design for Online Meta-Learning with Task Boundary Detection
Figure 3 for Algorithm Design for Online Meta-Learning with Task Boundary Detection
Figure 4 for Algorithm Design for Online Meta-Learning with Task Boundary Detection

Online meta-learning has recently emerged as a marriage between batch meta-learning and online learning, for achieving the capability of quick adaptation on new tasks in a lifelong manner. However, most existing approaches focus on the restrictive setting where the distribution of the online tasks remains fixed with known task boundaries. In this work, we relax these assumptions and propose a novel algorithm for task-agnostic online meta-learning in non-stationary environments. More specifically, we first propose two simple but effective detection mechanisms of task switches and distribution shift based on empirical observations, which serve as a key building block for more elegant online model updates in our algorithm: the task switch detection mechanism allows reusing of the best model available for the current task at hand, and the distribution shift detection mechanism differentiates the meta model update in order to preserve the knowledge for in-distribution tasks and quickly learn the new knowledge for out-of-distribution tasks. In particular, our online meta model updates are based only on the current data, which eliminates the need of storing previous data as required in most existing methods. We further show that a sublinear task-averaged regret can be achieved for our algorithm under mild conditions. Empirical studies on three different benchmarks clearly demonstrate the significant advantage of our algorithm over related baseline approaches.

* Submitted for publication 
Viaarxiv icon

HiFlash: Communication-Efficient Hierarchical Federated Learning with Adaptive Staleness Control and Heterogeneity-aware Client-Edge Association

Jan 16, 2023
Qiong Wu, Xu Chen, Tao Ouyang, Zhi Zhou, Xiaoxi Zhang, Shusen Yang, Junshan Zhang

Figure 1 for HiFlash: Communication-Efficient Hierarchical Federated Learning with Adaptive Staleness Control and Heterogeneity-aware Client-Edge Association
Figure 2 for HiFlash: Communication-Efficient Hierarchical Federated Learning with Adaptive Staleness Control and Heterogeneity-aware Client-Edge Association
Figure 3 for HiFlash: Communication-Efficient Hierarchical Federated Learning with Adaptive Staleness Control and Heterogeneity-aware Client-Edge Association
Figure 4 for HiFlash: Communication-Efficient Hierarchical Federated Learning with Adaptive Staleness Control and Heterogeneity-aware Client-Edge Association

Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients while keeping the training data locally. However, for many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN), leading to significant communication overhead and long transmission time. To mitigate the communication bottleneck, we resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing and combines synchronous client-edge model aggregation and asynchronous edge-cloud model aggregation together to greatly reduce the traffic volumes of WAN transmissions. Specifically, we first analyze the convergence bound of HiFL theoretically and identify the key controllable factors for model performance improvement. We then advocate an enhanced design of HiFlash by innovatively integrating deep reinforcement learning based adaptive staleness control and heterogeneity-aware client-edge association strategy to boost the system efficiency and mitigate the staleness effect without compromising model accuracy. Extensive experiments corroborate the superior performance of HiFlash in model accuracy, communication reduction, and system efficiency.

* Accepted by IEEE Transactions on Parallel and Distributed Systems, Jan. 2023 
Viaarxiv icon

Semantic Communications for Wireless Sensing: RIS-aided Encoding and Self-supervised Decoding

Nov 23, 2022
Hongyang Du, Jiacheng Wang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Junshan Zhang, Xuemin, Shen

Figure 1 for Semantic Communications for Wireless Sensing: RIS-aided Encoding and Self-supervised Decoding
Figure 2 for Semantic Communications for Wireless Sensing: RIS-aided Encoding and Self-supervised Decoding
Figure 3 for Semantic Communications for Wireless Sensing: RIS-aided Encoding and Self-supervised Decoding
Figure 4 for Semantic Communications for Wireless Sensing: RIS-aided Encoding and Self-supervised Decoding

Semantic communications can reduce the resource consumption by transmitting task-related semantic information extracted from source messages. However, when the source messages are utilized for various tasks, e.g., wireless sensing data for localization and activities detection, semantic communication technique is difficult to be implemented because of the increased processing complexity. In this paper, we propose the inverse semantic communications as a new paradigm. Instead of extracting semantic information from messages, we aim to encode the task-related source messages into a hyper-source message for data transmission or storage. Following this paradigm, we design an inverse semantic-aware wireless sensing framework with three algorithms for data sampling, reconfigurable intelligent surface (RIS)-aided encoding, and self-supervised decoding, respectively. Specifically, on the one hand, we design a novel RIS hardware for encoding several signal spectrums into one MetaSpectrum. To select the task-related signal spectrums for achieving efficient encoding, a semantic hash sampling method is introduced. On the other hand, we propose a self-supervised learning method for decoding the MetaSpectrums to obtain the original signal spectrums. Using the sensing data collected from real-world, we show that our framework can reduce the data volume by 95% compared to that before encoding, without affecting the accomplishment of sensing tasks. Moreover, compared with the typically used uniform sampling scheme, the proposed semantic hash sampling scheme can achieve 67% lower mean squared error in recovering the sensing parameters. In addition, experiment results demonstrate that the amplitude response matrix of the RIS enables the encryption of the sensing data.

Viaarxiv icon

Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer

Nov 01, 2022
Sen Lin, Li Yang, Deliang Fan, Junshan Zhang

Figure 1 for Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer
Figure 2 for Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer
Figure 3 for Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer
Figure 4 for Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer

By learning a sequence of tasks continually, an agent in continual learning (CL) can improve the learning performance of both a new task and `old' tasks by leveraging the forward knowledge transfer and the backward knowledge transfer, respectively. However, most existing CL methods focus on addressing catastrophic forgetting in neural networks by minimizing the modification of the learnt model for old tasks. This inevitably limits the backward knowledge transfer from the new task to the old tasks, because judicious model updates could possibly improve the learning performance of the old tasks as well. To tackle this problem, we first theoretically analyze the conditions under which updating the learnt model of old tasks could be beneficial for CL and also lead to backward knowledge transfer, based on the gradient projection onto the input subspaces of old tasks. Building on the theoretical analysis, we next develop a ContinUal learning method with Backward knowlEdge tRansfer (CUBER), for a fixed capacity neural network without data replay. In particular, CUBER first characterizes the task correlation to identify the positively correlated old tasks in a layer-wise manner, and then selectively modifies the learnt model of the old tasks when learning the new task. Experimental studies show that CUBER can even achieve positive backward knowledge transfer on several existing CL benchmarks for the first time without data replay, where the related baselines still suffer from catastrophic forgetting (negative backward knowledge transfer). The superior performance of CUBER on the backward knowledge transfer also leads to higher accuracy accordingly.

* Published as a conference paper at NeurIPS 2022 
Viaarxiv icon

Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services

Aug 11, 2022
Hongyang Du, Jiazhen Liu, Dusit Niyato, Jiawen Kang, Zehui Xiong, Junshan Zhang, Dong In Kim

Figure 1 for Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services
Figure 2 for Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services
Figure 3 for Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services
Figure 4 for Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services

As a virtual world interacting with the real world, Metaverse encapsulates our expectations of the next-generation Internet, bringing new key performance indicators (KPIs). Especially, Metaverse services based on graphical technologies, e.g., virtual traveling, require the low latency of virtual object data transmitting and the high reliability of user instruction uploading. Although conventional ultra-reliable and low-latency communications (URLLC) can satisfy the vast majority of objective service KPIs, it is difficult to offer users a personalized immersive experience that is a distinctive feature of next-generation Internet services. Since the quality of experience (QoE) can be regarded as a comprehensive KPI, the URLLC is evolved towards the next generation URLLC (xURLLC) to achieve higher QoE for Metaverse services by allocating more resources to virtual objects in which users are more interested. In this paper, we study the interaction between the Metaverse service provider (MSP) and the network infrastructure provider (InP) to deploy Metaverse xURLLC services. An optimal contract design framework is provided. Specifically, the utility of the MSP, defined as a function of Metaverse users' QoE, is to be maximized, while ensuring the incentives of the InP. To model the QoE of Metaverse xURLLC services, we propose a novel metric named Meta-Immersion that incorporates both the objective network KPIs and subjective feelings of Metaverse users. Using a user-object-attention level (UOAL) dataset, we develop and validate an attention-aware rendering capacity allocation scheme to improve QoE. It is shown that an average of 20.1% QoE improvement is achieved by the xURLLC compared to the conventional URLLC with the uniform allocation scheme. A higher percentage of QoE improvement, e.g., 40%, is achieved when the total resources are limited.

Viaarxiv icon