Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yunbo Wang

Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization

May 19, 2025

Haochen Yuan, Minting Pan, Yunbo Wang, Siyu Gao, Philip S. Yu, Xiaokang Yang

Abstract:Reinforcement learning (RL) has shown significant promise for sequential portfolio optimization tasks, such as stock trading, where the objective is to maximize cumulative returns while minimizing risks using historical data. However, traditional RL approaches often produce policies that merely memorize the optimal yet impractical buying and selling behaviors within the fixed dataset. These offline policies are less generalizable as they fail to account for the non-stationary nature of the market. Our approach, MetaTrader, frames portfolio optimization as a new type of partial-offline RL problem and makes two technical contributions. First, MetaTrader employs a bilevel learning framework that explicitly trains the RL agent to improve both in-domain profits on the original dataset and out-of-domain performance across diverse transformations of the raw financial data. Second, our approach incorporates a new temporal difference (TD) method that approximates worst-case TD estimates from a batch of transformed TD targets, addressing the value overestimation issue that is particularly challenging in scenarios with limited offline data. Our empirical results on two public stock datasets show that MetaTrader outperforms existing methods, including both RL-based approaches and traditional stock prediction models.

Via

Access Paper or Ask Questions

Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach

May 10, 2025

Minting Pan, Yitao Zheng, Jiajian Li, Yunbo Wang, Xiaokang Yang

Abstract:Offline reinforcement learning (RL) enables policy optimization in static datasets, avoiding the risks and costs of real-world exploration. However, it struggles with suboptimal behavior learning and inaccurate value estimation due to the lack of environmental interaction. In this paper, we present Video-Enhanced Offline RL (VeoRL), a model-based approach that constructs an interactive world model from diverse, unlabeled video data readily available online. Leveraging model-based behavior guidance, VeoRL transfers commonsense knowledge of control policy and physical dynamics from natural videos to the RL agent within the target domain. Our method achieves substantial performance gains (exceeding 100% in some cases) across visuomotor control tasks in robotic manipulation, autonomous driving, and open-world video games.

Via

Access Paper or Ask Questions

Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning

Apr 24, 2025

Mingqi Yuan, Qi Wang, Guozheng Ma, Bo Li, Xin Jin, Yunbo Wang, Xiaokang Yang, Wenjun Zeng, Dacheng Tao

Abstract:Developing lifelong learning agents is crucial for artificial general intelligence. However, deep reinforcement learning (RL) systems often suffer from plasticity loss, where neural networks gradually lose their ability to adapt during training. Despite its significance, this field lacks unified benchmarks and evaluation protocols. We introduce Plasticine, the first open-source framework for benchmarking plasticity optimization in deep RL. Plasticine provides single-file implementations of over 13 mitigation methods, 10 evaluation metrics, and learning scenarios with increasing non-stationarity levels from standard to open-ended environments. This framework enables researchers to systematically quantify plasticity loss, evaluate mitigation strategies, and analyze plasticity dynamics across different contexts. Our documentation, examples, and source code are available at https://github.com/RLE-Foundation/Plasticine.

* 23 pages

Via

Access Paper or Ask Questions

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

Mar 11, 2025

Qi Wang, Zhipeng Zhang, Baao Xie, Xin Jin, Yunbo Wang, Shiyu Wang, Liaomo Zheng, Xiaokang Yang, Wenjun Zeng

Abstract:Training visual reinforcement learning (RL) in practical scenarios presents a significant challenge, $\textit{i.e.,}$ RL agents suffer from low sample efficiency in environments with variations. While various approaches have attempted to alleviate this issue by disentanglement representation learning, these methods usually start learning from scratch without prior knowledge of the world. This paper, in contrast, tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints. To enable effective cross-domain semantic knowledge transfer, we introduce an interpretable model-based RL framework, dubbed Disentangled World Models (DisWM). Specifically, we pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos. The disentanglement capability of the pretrained model is then transferred to the world model through latent distillation. For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model. During the adaptation phase, the incorporation of actions and rewards from online environment interactions enriches the diversity of the data, which in turn strengthens the disentangled representation learning. Experimental results validate the superiority of our approach on various benchmarks.

Via

Access Paper or Ask Questions

DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning

Oct 08, 2024

Yichen Song, Yunbo Wang, Xiaokang Yang

Figure 1 for DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning

Figure 2 for DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning

Figure 3 for DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning

Figure 4 for DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning

Abstract:In the realm of computational physics, an enduring topic is the numerical solutions to partial differential equations (PDEs). Recently, the attention of researchers has shifted towards Neural Operator methods, renowned for their capability to approximate ``operators'' -- mappings from functions to functions. Despite the universal approximation theorem within neural operators, ensuring error bounds often requires employing numerous Fourier layers. However, what about lightweight models? In response to this question, we introduce DimOL (Dimension-aware Operator Learning), drawing insights from dimensional analysis. To implement DimOL, we propose the ProdLayer, which can be seamlessly integrated into FNO-based and Transformer-based PDE solvers, enhancing their ability to handle sum-of-products structures inherent in many physical systems. Empirically, DimOL models achieve up to 48% performance gain within the PDE datasets. Furthermore, by analyzing Fourier components' weights, we can symbolically discern the physical significance of each term. This sheds light on the opaque nature of neural networks, unveiling underlying physical principles.

Via

Access Paper or Ask Questions

Open-World Reinforcement Learning over Long Short-Term Imagination

Oct 04, 2024

Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang

Figure 1 for Open-World Reinforcement Learning over Long Short-Term Imagination

Figure 2 for Open-World Reinforcement Learning over Long Short-Term Imagination

Figure 3 for Open-World Reinforcement Learning over Long Short-Term Imagination

Figure 4 for Open-World Reinforcement Learning over Long Short-Term Imagination

Abstract:Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sighted", as they are typically trained on short snippets of imagined experiences. We argue that the primary obstacle in open-world decision-making is improving the efficiency of off-policy exploration across an extensive state space. In this paper, we present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a long short-term world model. To achieve this, we simulate goal-conditioned jumpy state transitions and compute corresponding affordance maps by zooming in on specific areas within single images. This facilitates the integration of direct long-term values into behavior learning. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.

Via

Access Paper or Ask Questions

Discovering Message Passing Hierarchies for Mesh-Based Physics Simulation

Oct 03, 2024

Huayu Deng, Xiangming Zhu, Yunbo Wang, Xiaokang Yang

Figure 1 for Discovering Message Passing Hierarchies for Mesh-Based Physics Simulation

Figure 2 for Discovering Message Passing Hierarchies for Mesh-Based Physics Simulation

Figure 3 for Discovering Message Passing Hierarchies for Mesh-Based Physics Simulation

Figure 4 for Discovering Message Passing Hierarchies for Mesh-Based Physics Simulation

Abstract:Graph neural networks have emerged as a powerful tool for large-scale mesh-based physics simulation. Existing approaches primarily employ hierarchical, multi-scale message passing to capture long-range dependencies within the graph. However, these graph hierarchies are typically fixed and manually designed, which do not adapt to the evolving dynamics present in complex physical systems. In this paper, we introduce a novel neural network named DHMP, which learns Dynamic Hierarchies for Message Passing networks through a differentiable node selection method. The key component is the anisotropic message passing mechanism, which operates at both intra-level and inter-level interactions. Unlike existing methods, it first supports directionally non-uniform aggregation of dynamic features between adjacent nodes within each graph hierarchy. Second, it determines node selection probabilities for the next hierarchy according to different physical contexts, thereby creating more flexible message shortcuts for learning remote node relations. Our experiments demonstrate the effectiveness of DHMP, achieving 22.7% improvement on average compared to recent fixed-hierarchy message passing networks across five classic physics simulation datasets.

Via

Access Paper or Ask Questions

Learning Augmentation Policies from A Model Zoo for Time Series Forecasting

Sep 10, 2024

Haochen Yuan, Xuelin Li, Yunbo Wang, Xiaokang Yang

Figure 1 for Learning Augmentation Policies from A Model Zoo for Time Series Forecasting

Figure 2 for Learning Augmentation Policies from A Model Zoo for Time Series Forecasting

Figure 3 for Learning Augmentation Policies from A Model Zoo for Time Series Forecasting

Figure 4 for Learning Augmentation Policies from A Model Zoo for Time Series Forecasting

Abstract:Time series forecasting models typically rely on a fixed-size training set and treat all data uniformly, which may not effectively capture the specific patterns present in more challenging training samples. To address this issue, we introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning. Our approach begins with an empirical analysis to determine which parts of the training data should be augmented. Specifically, we identify the so-called marginal samples by considering the prediction diversity across a set of pretrained forecasting models. Next, we propose using variational masked autoencoders as the augmentation model and applying the REINFORCE algorithm to transform the marginal samples into new data. The goal of this generative model is not only to mimic the distribution of real data but also to reduce the variance of prediction errors across the model zoo. By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance, advancing the prior art in this field with minimal additional computational cost.

Via

Access Paper or Ask Questions

Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

Jul 30, 2024

Yanpeng Zhao, Yiwei Hao, Siyu Gao, Yunbo Wang, Xiaokang Yang

Figure 1 for Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

Figure 2 for Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

Figure 3 for Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

Figure 4 for Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

Abstract:Learning object-centric representations from unsupervised videos is challenging. Unlike most previous approaches that focus on decomposing 2D images, we present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning within a differentiable volume rendering framework. The key idea is to perform object-centric voxelization to capture the 3D nature of the scene, which infers per-object occupancy probabilities at individual spatial locations. These voxel features evolve through a canonical-space deformation function and are optimized in an inverse rendering pipeline with a compositional NeRF. Additionally, our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids. DynaVol-S significantly outperforms existing models in both novel view synthesis and unsupervised decomposition tasks for dynamic scenes. By jointly considering geometric structures and semantic features, it effectively addresses challenging real-world scenarios involving complex object interactions. Furthermore, once trained, the explicitly meaningful voxel features enable additional capabilities that 2D scene decomposition methods cannot achieve, such as novel scene generation through editing geometric shapes or manipulating the motion trajectories of objects.

Via

Access Paper or Ask Questions

Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

Jun 18, 2024

Xiangming Zhu, Huayu Deng, Haochen Yuan, Yunbo Wang, Xiaokang Yang

Figure 1 for Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

Figure 2 for Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

Figure 3 for Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

Figure 4 for Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

Abstract:We introduce latent intuitive physics, a transfer learning framework for physics simulation that can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes. Our key insight is to use latent features drawn from a learnable prior distribution conditioned on the underlying particle states to capture the invisible and complex physical properties. To achieve this, we train a parametrized prior learner given visual observations to approximate the visual posterior of inverse graphics, and both the particle states and the visual posterior are obtained from a learned neural renderer. The converged prior learner is embedded in our probabilistic physics engine, allowing us to perform novel simulations on unseen geometries, boundaries, and dynamics without knowledge of the true physical parameters. We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation. Our model demonstrates strong performance in all three tasks.

* ICLR 2024
* Published as a conference paper at ICLR 2024

Via

Access Paper or Ask Questions