Abstract:This paper studies reinforcement learning (RL) in infinite-horizon dynamic decision processes with almost-sure safety constraints. Such safety-constrained decision processes are central to applications in autonomous systems, finance, and resource management, where policies must satisfy strict, state-dependent constraints. We consider a doubly-regularized RL framework that combines reward and parameter regularization to address these constraints within continuous state-action spaces. Specifically, we formulate the problem as a convex regularized objective with parametrized policies in the mean-field regime. Our approach leverages recent developments in mean-field theory and Wasserstein gradient flows to model policies as elements of an infinite-dimensional statistical manifold, with policy updates evolving via gradient flows on the space of parameter distributions. Our main contributions include establishing solvability conditions for safety-constrained problems, defining smooth and bounded approximations that facilitate gradient flows, and demonstrating exponential convergence towards global solutions under sufficient regularization. We provide general conditions on regularization functions, encompassing standard entropy regularization as a special case. The results also enable a particle method implementation for practical RL applications. The theoretical insights and convergence guarantees presented here offer a robust framework for safe RL in complex, high-dimensional decision-making problems.
Abstract:A shift from even-aged forest management to uneven-aged management practices leads to a problem rather different from the existing straightforward practice that follows a rotation cycle of artificial regeneration, thinning of inferior trees and a clearcut. A lack of realistic models and methods suggesting how to manage uneven-aged stands in a way that is economically viable and ecologically sustainable creates difficulties in adopting this new management practice. To tackle this problem, we make a two-fold contribution in this paper. The first contribution is the proposal of an algorithm that is able to handle a realistic uneven-aged stand management model that is otherwise computationally tedious and intractable. The model considered in this paper is an empirically estimated size-structured ecological model for uneven-aged spruce forests. The second contribution is on the sensitivity analysis of the forest model with respect to a number of important parameters. The analysis provides us an insight into the behavior of the uneven-aged forest model.