Abstract:Reinforcement learning (RL) algorithms are designed to optimize problem-solving by learning actions that maximize rewards, a task that becomes particularly challenging in random and nonstationary environments. Even advanced RL algorithms are often limited in their ability to solve problems in these conditions. In applications such as searching for underwater pollution clouds with autonomous underwater vehicles (AUVs), RL algorithms must navigate reward-sparse environments, where actions frequently result in a zero reward. This paper aims to address these challenges by revisiting and modifying classical RL approaches to efficiently operate in sparse, randomized, and nonstationary environments. We systematically study a large number of modifications, including hierarchical algorithm changes, multigoal learning, and the integration of a location memory as an external output filter to prevent state revisits. Our results demonstrate that a modified Monte Carlo-based approach significantly outperforms traditional Q-learning and two exhaustive search patterns, illustrating its potential in adapting RL to complex environments. These findings suggest that reinforcement learning approaches can be effectively adapted for use in random, nonstationary, and reward-sparse environments.




Abstract:Solar irradiance forecasts can be dynamic and unreliable due to changing weather conditions. Near the Arctic circle, this also translates into a distinct set of further challenges. This work is forecasting solar irradiance with Norwegian data using variations of Long-Short-Term Memory units (LSTMs). In order to gain more trustworthiness of results, the probabilistic approaches Quantile Regression (QR) and Maximum Likelihood (MLE) are optimized on top of the LSTMs, providing measures of uncertainty for the results. MLE is further extended by using a Johnson's SU distribution, a Johnson's SB distribution, and a Weibull distribution in addition to a normal Gaussian to model parameters. Contrary to a Gaussian, Weibull, Johnson's SU and Johnson's SB can return skewed distributions, enabling it to fit the non-normal solar irradiance distribution more optimally. The LSTMs are compared against each other, a simple Multi-layer Perceptron (MLP), and a smart-persistence estimator. The proposed LSTMs are found to be more accurate than smart persistence and the MLP for a multi-horizon, day-ahead (36 hours) forecast. The deterministic LSTM showed better root mean squared error (RMSE), but worse mean absolute error (MAE) than a MLE with Johnson's SB distribution. Probabilistic uncertainty estimation is shown to fit relatively well across the distribution of observed irradiance. While QR shows better uncertainty estimation calibration, MLE with Johnson's SB, Johnson's SU, or Gaussian show better performance in the other metrics employed. Optimizing and comparing the models against each other reveals a seemingly inherent trade-off between point-prediction and uncertainty estimation calibration.