Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wuhao Wang

DATD3: Depthwise Attention Twin Delayed Deep Deterministic Policy Gradient For Model Free Reinforcement Learning Under Output Feedback Control

May 29, 2025

Wuhao Wang, Zhiyong Chen

Abstract:Reinforcement learning in real-world applications often involves output-feedback settings, where the agent receives only partial state information. To address this challenge, we propose the Output-Feedback Markov Decision Process (OPMDP), which extends the standard MDP formulation to accommodate decision-making based on observation histories. Building on this framework, we introduce Depthwise Attention Twin Delayed Deep Deterministic Policy Gradient (DATD3), a novel actor-critic algorithm that employs depthwise separable convolution and multi-head attention to encode historical observations. DATD3 maintains policy expressiveness while avoiding the instability of recurrent models. Extensive experiments on continuous control tasks demonstrate that DATD3 outperforms existing memory-based and recurrent baselines under both partial and full observability.

Via

Access Paper or Ask Questions

Multi-State TD Target for Model-Free Reinforcement Learning

May 26, 2024

Wuhao Wang, Zhiyong Chen, Lepeng Zhang

Figure 1 for Multi-State TD Target for Model-Free Reinforcement Learning

Figure 2 for Multi-State TD Target for Model-Free Reinforcement Learning

Figure 3 for Multi-State TD Target for Model-Free Reinforcement Learning

Figure 4 for Multi-State TD Target for Model-Free Reinforcement Learning

Abstract:Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. Traditionally, TD learning relies on the value of a single subsequent state. We propose an enhanced multi-state TD (MSTD) target that utilizes the estimated values of multiple subsequent states. Building on this new MSTD concept, we develop complete actor-critic algorithms that include management of replay buffers in two modes, and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Experimental results demonstrate that algorithms employing the MSTD target significantly improve learning performance compared to traditional methods.

* 6 pages, 16 figures

Via

Access Paper or Ask Questions

Beware of diffusion models for synthesizing medical images -- A comparison with GANs in terms of memorizing brain tumor images

May 12, 2023

Muhammad Usman Akbar, Wuhao Wang, Anders Eklund

Abstract:Diffusion models were initially developed for text-to-image generation and are now being utilized to generate high quality synthetic images. Preceded by GANs, diffusion models have shown impressive results using various evaluation metrics. However, commonly used metrics such as FID and IS are not suitable for determining whether diffusion models are simply reproducing the training images. Here we train StyleGAN and diffusion models, using BRATS20 and BRATS21 datasets, to synthesize brain tumor images, and measure the correlation between the synthetic images and all training images. Our results show that diffusion models are much more likely to memorize the training images, especially for small datasets. Researchers should be careful when using diffusion models for medical imaging, if the final goal is to share the synthetic images.

* 9 Pages, 3 Figures

Via

Access Paper or Ask Questions