Abstract:We propose a method that enables large language models (LLMs) to control embodied agents by directly mapping continuous observation vectors to continuous action vectors. Initially, the LLMs generate a control strategy based on a textual description of the agent, its environment, and the intended goal. This strategy is then iteratively refined through a learning process in which the LLMs are repeatedly prompted to improve the current strategy, using performance feedback and sensory-motor data collected during its evaluation. The method is validated on classic control tasks from the Gymnasium library and the inverted pendulum task from the MuJoCo library. In most cases, it successfully identifies optimal or high-performing solutions by integrating symbolic knowledge derived through reasoning with sub-symbolic sensory-motor data gathered as the agent interacts with its environment.
Abstract:Previous evolutionary studies demonstrated how evaluating evolving agents in variable environmental conditions enable them to develop solutions that are robust to environmental variation. We demonstrate how the robustness of the agents can be further improved by exposing them also to environmental variations throughout generations. These two types of environmental variations play partially distinct roles as demonstrated by the fact that agents evolved in environments that do not vary throughout generations display lower performance than agents evolved in varying environments independently from the amount of environmental variation experienced during evaluation. Moreover, our results demonstrate that performance increases when the amount of variations introduced during agents evaluation and the rate at which the environment varies throughout generations are moderate. This is explained by the fact that the probability to retain genetic variations, including non-neutral variations that alter the behavior of the agents, increases when the environment varies throughout generations but also when new environmental conditions persist over time long enough to enable genetic accommodation.