Alert button
Picture for Xingguo Chen

Xingguo Chen

Alert button

Model-based Offline Policy Optimization with Adversarial Network

Sep 05, 2023
Junming Yang, Xingguo Chen, Shengyuan Wang, Bolei Zhang

Figure 1 for Model-based Offline Policy Optimization with Adversarial Network
Figure 2 for Model-based Offline Policy Optimization with Adversarial Network
Figure 3 for Model-based Offline Policy Optimization with Adversarial Network
Figure 4 for Model-based Offline Policy Optimization with Adversarial Network

Model-based offline reinforcement learning (RL), which builds a supervised transition model with logging dataset to avoid costly interactions with the online environment, has been a promising approach for offline policy optimization. As the discrepancy between the logging data and online environment may result in a distributional shift problem, many prior works have studied how to build robust transition models conservatively and estimate the model uncertainty accurately. However, the over-conservatism can limit the exploration of the agent, and the uncertainty estimates may be unreliable. In this work, we propose a novel Model-based Offline policy optimization framework with Adversarial Network (MOAN). The key idea is to use adversarial learning to build a transition model with better generalization, where an adversary is introduced to distinguish between in-distribution and out-of-distribution samples. Moreover, the adversary can naturally provide a quantification of the model's uncertainty with theoretical guarantees. Extensive experiments showed that our approach outperforms existing state-of-the-art baselines on widely studied offline RL benchmarks. It can also generate diverse in-distribution samples, and quantify the uncertainty more accurately.

* Accepted by 26th European Conference on Artificial Intelligence ECAI 2023 
Viaarxiv icon

Online Attentive Kernel-Based Temporal Difference Learning

Jan 22, 2022
Guang Yang, Xingguo Chen, Shangdong Yang, Huihui Wang, Shaokang Dong, Yang Gao

Figure 1 for Online Attentive Kernel-Based Temporal Difference Learning
Figure 2 for Online Attentive Kernel-Based Temporal Difference Learning
Figure 3 for Online Attentive Kernel-Based Temporal Difference Learning
Figure 4 for Online Attentive Kernel-Based Temporal Difference Learning

With rising uncertainty in the real world, online Reinforcement Learning (RL) has been receiving increasing attention due to its fast learning capability and improving data efficiency. However, online RL often suffers from complex Value Function Approximation (VFA) and catastrophic interference, creating difficulty for the deep neural network to be applied to an online RL algorithm in a fully online setting. Therefore, a simpler and more adaptive approach is introduced to evaluate value function with the kernel-based model. Sparse representations are superior at handling interference, indicating that competitive sparse representations should be learnable, non-prior, non-truncated and explicit when compared with current sparse representation methods. Moreover, in learning sparse representations, attention mechanisms are utilized to represent the degree of sparsification, and a smooth attentive function is introduced into the kernel-based VFA. In this paper, we propose an Online Attentive Kernel-Based Temporal Difference (OAKTD) algorithm using two-timescale optimization and provide convergence analysis of our proposed algorithm. Experimental evaluations showed that OAKTD outperformed several Online Kernel-based Temporal Difference (OKTD) learning algorithms in addition to the Temporal Difference (TD) learning algorithm with Tile Coding on public Mountain Car, Acrobot, CartPole and Puddle World tasks.

Viaarxiv icon

Multi-Agent Game Abstraction via Graph Attention Neural Network

Nov 25, 2019
Yong Liu, Weixun Wang, Yujing Hu, Jianye Hao, Xingguo Chen, Yang Gao

Figure 1 for Multi-Agent Game Abstraction via Graph Attention Neural Network
Figure 2 for Multi-Agent Game Abstraction via Graph Attention Neural Network
Figure 3 for Multi-Agent Game Abstraction via Graph Attention Neural Network
Figure 4 for Multi-Agent Game Abstraction via Graph Attention Neural Network

In large-scale multi-agent systems, the large number of agents and complex game relationship cause great difficulty for policy learning. Therefore, simplifying the learning process is an important research issue. In many multi-agent systems, the interactions between agents often happen locally, which means that agents neither need to coordinate with all other agents nor need to coordinate with others all the time. Traditional methods attempt to use pre-defined rules to capture the interaction relationship between agents. However, the methods cannot be directly used in a large-scale environment due to the difficulty of transforming the complex interactions between agents into rules. In this paper, we model the relationship between agents by a complete graph and propose a novel game abstraction mechanism based on two-stage attention network (G2ANet), which can indicate whether there is an interaction between two agents and the importance of the interaction. We integrate this detection mechanism into graph neural network-based multi-agent reinforcement learning for conducting game abstraction and propose two novel learning algorithms GA-Comm and GA-AC. We conduct experiments in Traffic Junction and Predator-Prey. The results indicate that the proposed methods can simplify the learning process and meanwhile get better asymptotic performance compared with state-of-the-art algorithms.

* Accepted by AAAI2020 
Viaarxiv icon