Reinforcement learning (RL) control approach with application into power electronics systems has become an emerging topic whilst the sim-to-real issue remains a challenging problem as very few results can be referred to in the literature. Indeed, due to the inevitable mismatch between simulation models and real-life systems, offline trained RL control strategies may sustain unexpected hurdles in practical implementation during transferring procedure. As the main contribution of this paper, a transferring methodology via a delicately designed duty ratio mapping (DRM) is proposed for a DC-DC buck converter. Then, a detailed sim-to-real process is presented to enable the implementation of a model-free deep reinforcement learning (DRL) controller. The feasibility and effectiveness of the proposed methodology are demonstrated by comparative experimental studies.
As a typical switching power supply, the DC-DC converter has been widely applied in DC microgrid. Due to the variation of renewable energy generation, research and design of DC-DC converter control algorithm with outstanding dynamic characteristics has significant theoretical and practical application value. To mitigate the bus voltage stability issue in DC microgrid, an innovative intelligent control strategy for buck DC-DC converter with constant power loads (CPLs) via deep reinforcement learning algorithm is constructed for the first time. In this article, a Markov Decision Process (MDP) model and the deep Q network (DQN) algorithm are defined for DC-DC converter. A model-free based deep reinforcement learning (DRL) control strategy is appropriately designed to adjust the agent-environment interaction through the rewards/penalties mechanism towards achieving converge to nominal voltage. The agent makes approximate decisions by extracting the high-dimensional feature of complex power systems without any prior knowledge. Eventually, the simulation comparison results demonstrate that the proposed controller has stronger self-learning and self-optimization capabilities under the different scenarios.