Abstract:Reinforcement Learning (RL) agents often struggle in real-world applications where environmental conditions are non-stationary, particularly when reward functions shift or the available action space expands. This paper introduces MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. By integrating concept drift detection with dynamic adjustments to learning and exploration hyperparameters, MORPHIN adapts agents to changes in both the reward function and on-the-fly expansions of the agent's action space, while preserving prior policy knowledge to prevent catastrophic forgetting. We validate our approach using a Gridworld benchmark and a traffic signal control simulation. The results demonstrate that MORPHIN achieves superior convergence speed and continuous adaptation compared to a standard Q-learning baseline, improving learning efficiency by up to 1.7x.
Abstract:Self-adaptive systems continuously adapt to changes in their execution environment. Capturing all possible changes to define suitable behaviour beforehand is unfeasible, or even impossible in the case of unknown changes, hence human intervention may be required. We argue that adapting to unknown situations is the ultimate challenge for self-adaptive systems. Learning-based approaches are used to learn the suitable behaviour to exhibit in the case of unknown situations, to minimize or fully remove human intervention. While such approaches can, to a certain extent, generalize existing adaptations to new situations, there is a number of breakthroughs that need to be achieved before systems can adapt to general unknown and unforeseen situations. We posit the research directions that need to be explored to achieve unanticipated adaptation from the perspective of learning-based self-adaptive systems. At minimum, systems need to define internal representations of previously unseen situations on-the-fly, extrapolate the relationship to the previously encountered situations to evolve existing adaptations, and reason about the feasibility of achieving their intrinsic goals in the new set of conditions. We close discussing whether, even when we can, we should indeed build systems that define their own behaviour and adapt their goals, without involving a human supervisor.