Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shan Zhong

Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy

Nov 15, 2025

Hongyang Yang, Xiao-Yang Liu, Shan Zhong, Anwar Walid

Figure 1 for Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy

Figure 2 for Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy

Figure 3 for Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy

Figure 4 for Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy

Abstract:Stock trading strategies play a critical role in investment. However, it is challenging to design a profitable strategy in a complex and dynamic stock market. In this paper, we propose an ensemble strategy that employs deep reinforcement schemes to learn a stock trading strategy by maximizing investment return. We train a deep reinforcement learning agent and obtain an ensemble trading strategy using three actor-critic based algorithms: Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG). The ensemble strategy inherits and integrates the best features of the three algorithms, thereby robustly adjusting to different market situations. In order to avoid the large memory consumption in training networks with continuous action space, we employ a load-on-demand technique for processing very large data. We test our algorithms on the 30 Dow Jones stocks that have adequate liquidity. The performance of the trading agent with different reinforcement learning algorithms is evaluated and compared with both the Dow Jones Industrial Average index and the traditional min-variance portfolio allocation strategy. The proposed deep ensemble strategy is shown to outperform the three individual algorithms and two baselines in terms of the risk-adjusted return measured by the Sharpe ratio. This work is fully open-sourced at \href{https://github.com/AI4Finance-Foundation/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020}{GitHub}.

* Accepted by ICAIF '20: Proceedings of the First ACM International Conference on AI in Finance. Conference program: https://ai-finance.org/2020program/

Via

Access Paper or Ask Questions

FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Oct 26, 2025

Shan Zhong, Shutong Ding, He Diao, Xiangyu Wang, Kah Chan Teh, Bei Peng

Figure 1 for FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Figure 2 for FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Figure 3 for FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Figure 4 for FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Abstract:Reliable value estimation serves as the cornerstone of reinforcement learning (RL) by evaluating long-term returns and guiding policy improvement, significantly influencing the convergence speed and final performance. Existing works improve the reliability of value function estimation via multi-critic ensembles and distributional RL, yet the former merely combines multi point estimation without capturing distributional information, whereas the latter relies on discretization or quantile regression, limiting the expressiveness of complex value distributions. Inspired by flow matching's success in generative modeling, we propose a generative paradigm for value estimation, named FlowCritic. Departing from conventional regression for deterministic value prediction, FlowCritic leverages flow matching to model value distributions and generate samples for value estimation.

Via

Access Paper or Ask Questions

GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

May 24, 2025

Shutong Ding, Ke Hu, Shan Zhong, Haoyang Luo, Weinan Zhang, Jingya Wang, Jun Wang, Ye Shi

Figure 1 for GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

Figure 2 for GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

Figure 3 for GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

Figure 4 for GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

Abstract:Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL settings, integrating diffusion policies into on-policy frameworks like PPO remains underexplored. This gap is particularly significant given the widespread use of large-scale parallel GPU-accelerated simulators, such as IsaacLab, which are optimized for on-policy RL algorithms and enable rapid training of complex robotic tasks. A key challenge lies in computing state-action log-likelihoods under diffusion policies, which is straightforward for Gaussian policies but intractable for flow-based models due to irreversible forward-reverse processes and discretization errors (e.g., Euler-Maruyama approximations). To bridge this gap, we propose GenPO, a generative policy optimization framework that leverages exact diffusion inversion to construct invertible action mappings. GenPO introduces a novel doubled dummy action mechanism that enables invertibility via alternating updates, resolving log-likelihood computation barriers. Furthermore, we also use the action log-likelihood for unbiased entropy and KL divergence estimation, enabling KL-adaptive learning rates and entropy regularization in on-policy updates. Extensive experiments on eight IsaacLab benchmarks, including legged locomotion (Ant, Humanoid, Anymal-D, Unitree H1, Go2), dexterous manipulation (Shadow Hand), aerial control (Quadcopter), and robotic arm tasks (Franka), demonstrate GenPO's superiority over existing RL baselines. Notably, GenPO is the first method to successfully integrate diffusion policies into on-policy RL, unlocking their potential for large-scale parallelized training and real-world robotic deployment.

Via

Access Paper or Ask Questions

Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Nov 09, 2024

Shan Zhong, Jiahao Zeng, Yongxin Yu, Bohong Lin

Figure 1 for Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Figure 2 for Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Figure 3 for Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Figure 4 for Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs

Abstract:This paper introduces an innovative semi-supervised learning approach for text classification, addressing the challenge of abundant data but limited labeled examples. Our methodology integrates few-shot learning with retrieval-augmented generation (RAG) and conventional statistical clustering, enabling effective learning from a minimal number of labeled instances while generating high-quality labeled data. To the best of our knowledge, we are the first to incorporate RAG alongside clustering in text data generation. Our experiments on the Reuters and Web of Science datasets demonstrate state-of-the-art performance, with few-shot augmented data alone producing results nearly equivalent to those achieved with fully labeled datasets. Notably, accuracies of 95.41\% and 82.43\% were achieved for complex text document classification tasks, where the number of categories can exceed 100.

Via

Access Paper or Ask Questions

Monitoring Drug-Induced Brain Activity Changes with Functional Ultrasound Imaging and Convolutional Neural Networks

Oct 12, 2024

Jared Deighton, Shan Zhong, Kofi Agyeman, Wooseong Choi, Charles Liu, Darrin Lee, Vasileios Maroulas, Vasileios Christopoulos

Figure 1 for Monitoring Drug-Induced Brain Activity Changes with Functional Ultrasound Imaging and Convolutional Neural Networks

Figure 2 for Monitoring Drug-Induced Brain Activity Changes with Functional Ultrasound Imaging and Convolutional Neural Networks

Figure 3 for Monitoring Drug-Induced Brain Activity Changes with Functional Ultrasound Imaging and Convolutional Neural Networks

Figure 4 for Monitoring Drug-Induced Brain Activity Changes with Functional Ultrasound Imaging and Convolutional Neural Networks

Abstract:Functional ultrasound imaging (fUSI) is a cutting-edge technology that measures changes in cerebral blood volume (CBV) by detecting backscattered echoes from red blood cells moving within its field of view (FOV). It offers high spatiotemporal resolution and sensitivity, allowing for detailed visualization of cerebral blood flow dynamics. While fUSI has been utilized in preclinical drug development studies to explore the mechanisms of action of various drugs targeting the central nervous system, many of these studies have primarily focused on predetermined regions of interest (ROIs). This focus may overlook relevant brain activity outside these specific areas, which could influence the results. To address this limitation, we combined convolutional neural networks (CNNs) with fUSI to comprehensively understand the pharmacokinetic process of Dizocilpine, also known as MK-801, a drug that blocks the N-Methyl-D-aspartate (NMDA) receptor in the central nervous system. CNN and class activation mapping (CAM) revealed the spatiotemporal effects of MK-801, which originated in the cortex and propagated to the hippocampus, demonstrating the ability to detect dynamic drug effects over time. Additionally, CNN and CAM assessed the impact of anesthesia on the spatiotemporal hemodynamics of the brain, revealing no distinct patterns between early and late stages. The integration of fUSI and CNN provides a powerful tool to gain insights into the spatiotemporal dynamics of drug action in the brain. This combination enables a comprehensive and unbiased assessment of drug effects on brain function, potentially accelerating the development of new therapies in neuropharmacological studies.

Via

Access Paper or Ask Questions

Adaptive Environment-Aware Robotic Arm Reaching Based on a Bio-Inspired Neurodynamical Computational Framework

Jul 16, 2024

Dimitrios Chatziparaschis, Shan Zhong, Vasileios Christopoulos, Konstantinos Karydis

Figure 1 for Adaptive Environment-Aware Robotic Arm Reaching Based on a Bio-Inspired Neurodynamical Computational Framework

Figure 2 for Adaptive Environment-Aware Robotic Arm Reaching Based on a Bio-Inspired Neurodynamical Computational Framework

Figure 3 for Adaptive Environment-Aware Robotic Arm Reaching Based on a Bio-Inspired Neurodynamical Computational Framework

Figure 4 for Adaptive Environment-Aware Robotic Arm Reaching Based on a Bio-Inspired Neurodynamical Computational Framework

Abstract:Bio-inspired robotic systems are capable of adaptive learning, scalable control, and efficient information processing. Enabling real-time decision-making for such systems is critical to respond to dynamic changes in the environment. We focus on dynamic target tracking in open areas using a robotic six-degree-of-freedom manipulator with a bird-eye view camera for visual feedback, and by deploying the Neurodynamical Computational Framework (NeuCF). NeuCF is a recently developed bio-inspired model for target tracking based on Dynamic Neural Fields (DNFs) and Stochastic Optimal Control (SOC) theory. It has been trained for reaching actions on a planar surface toward localized visual beacons, and it can re-target or generate stop signals on the fly based on changes in the environment (e.g., a new target has emerged, or an existing one has been removed). We evaluated our system over various target-reaching scenarios. In all experiments, NeuCF had high end-effector positional accuracy, generated smooth trajectories, and provided reduced path lengths compared with a baseline cubic polynomial trajectory generator. In all, the developed system offers a robust and dynamic-aware robotic manipulation approach that affords real-time decision-making.

* 6 pages, 6 figures, conference

Via

Access Paper or Ask Questions

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Dec 29, 2023

Xiao-Yang Liu, Rongyi Zhu, Daochen Zha, Jiechao Gao, Shan Zhong, Meikang Qiu

Figure 1 for Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Figure 2 for Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Figure 3 for Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Figure 4 for Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Abstract:The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, allowing decentralized fine-tuning without exposing raw data to central servers. Motivated by this, we investigate how data privacy can be ensured in LLM fine-tuning through practical federated learning approaches, enabling secure contributions from multiple parties to enhance LLMs. Yet, challenges arise: 1) despite avoiding raw data exposure, there is a risk of inferring sensitive information from model outputs, and 2) federated learning for LLMs incurs notable communication overhead. To address these challenges, this article introduces DP-LoRA, a novel federated learning algorithm tailored for LLMs. DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training. Moreover, DP-LoRA optimizes communication efficiency via low-rank adaptation, minimizing the transmission of updated weights during distributed training. The experimental results across medical, financial, and general datasets using various LLMs demonstrate that DP-LoRA effectively ensures strict privacy constraints while minimizing communication overhead.

* 20 pages, 1 figure, 22 tables

Via

Access Paper or Ask Questions

Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

Sep 22, 2023

Zuxuan Zhang, Gang Wang, Jiacheng He, Shan Zhong

Figure 1 for Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

Figure 2 for Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

Figure 3 for Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

Figure 4 for Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

Abstract:The estimation of non-Gaussian measurement noise models is a significant challenge across various fields. In practical applications, it often faces challenges due to the large number of parameters and high computational complexity. This paper proposes a threshold-based Kalman filtering approach for online estimation of noise parameters in non-Gaussian measurement noise models. This method uses a certain amount of sample data to infer the variance threshold of observation parameters and employs variational Bayesian estimation to obtain corresponding noise variance estimates, enabling subsequent iterations of the Kalman filtering algorithm. Finally, we evaluate the performance of this algorithm through simulation experiments, demonstrating its accurate and effective estimation of state and noise parameters.

* 5 pages, conference

Via

Access Paper or Ask Questions

Seasonality Based Reranking of E-commerce Autocomplete Using Natural Language Queries

Aug 03, 2023

Prateek Verma, Shan Zhong, Xiaoyu Liu, Adithya Rajan

Figure 1 for Seasonality Based Reranking of E-commerce Autocomplete Using Natural Language Queries

Figure 2 for Seasonality Based Reranking of E-commerce Autocomplete Using Natural Language Queries

Figure 3 for Seasonality Based Reranking of E-commerce Autocomplete Using Natural Language Queries

Figure 4 for Seasonality Based Reranking of E-commerce Autocomplete Using Natural Language Queries

Abstract:Query autocomplete (QAC) also known as typeahead, suggests list of complete queries as user types prefix in the search box. It is one of the key features of modern search engines specially in e-commerce. One of the goals of typeahead is to suggest relevant queries to users which are seasonally important. In this paper we propose a neural network based natural language processing (NLP) algorithm to incorporate seasonality as a signal and present end to end evaluation of the QAC ranking model. Incorporating seasonality into autocomplete ranking model can improve autocomplete relevance and business metric.

* Accepted at The 6th Workshop on e-Commerce and NLP (ECNLP 6), KDD'23, Long Beach, CA

Via

Access Paper or Ask Questions

Quantized generalized minimum error entropy for kernel recursive least squares adaptive filtering

Jul 04, 2023

Jiacheng He, Gang Wang, Kun Zhang, Shan Zhong, Bei Peng, Min Li

Figure 1 for Quantized generalized minimum error entropy for kernel recursive least squares adaptive filtering

Figure 2 for Quantized generalized minimum error entropy for kernel recursive least squares adaptive filtering

Figure 3 for Quantized generalized minimum error entropy for kernel recursive least squares adaptive filtering

Figure 4 for Quantized generalized minimum error entropy for kernel recursive least squares adaptive filtering

Abstract:The robustness of the kernel recursive least square (KRLS) algorithm has recently been improved by combining them with more robust information-theoretic learning criteria, such as minimum error entropy (MEE) and generalized MEE (GMEE), which also improves the computational complexity of the KRLS-type algorithms to a certain extent. To reduce the computational load of the KRLS-type algorithms, the quantized GMEE (QGMEE) criterion, in this paper, is combined with the KRLS algorithm, and as a result two kinds of KRLS-type algorithms, called quantized kernel recursive MEE (QKRMEE) and quantized kernel recursive GMEE (QKRGMEE), are designed. As well, the mean error behavior, mean square error behavior, and computational complexity of the proposed algorithms are investigated. In addition, simulation and real experimental data are utilized to verify the feasibility of the proposed algorithms.

Via

Access Paper or Ask Questions