In multi-stage processes, decisions occur in an ordered sequence of stages. Early stages usually have more observations with general information (easier/cheaper to collect), while later stages have fewer observations but more specific data. This situation can be represented by a dual funnel structure, in which the sample size decreases from one stage to the other while the information increases. Training classifiers in this scenario is challenging since information in the early stages may not contain distinct patterns to learn (underfitting). In contrast, the small sample size in later stages can cause overfitting. We address both cases by introducing a framework that combines adversarial autoencoders (AAE), multi-task learning (MTL), and multi-label semi-supervised learning (MLSSL). We improve the decoder of the AAE with an MTL component so it can jointly reconstruct the original input and use feature nets to predict the features for the next stages. We also introduce a sequence constraint in the output of an MLSSL classifier to guarantee the sequential pattern in the predictions. Using real-world data from different domains (selection process, medical diagnosis), we show that our approach outperforms other state-of-the-art methods.
We propose the Interactive Constrained MAP-Elites, a quality-diversity solution for game content generation, implemented as a new feature of the Evolutionary Dungeon Designer: a mixed-initiative co-creativity tool for designing dungeons. The feature uses the MAP-Elites algorithm, an illumination algorithm that segregates the population among several cells depending on their scores with respect to different behavioral dimensions. Users can flexibly and dynamically alternate between these dimensions anytime, thus guiding the evolutionary process in an intuitive way, and then incorporate suggestions produced by the algorithm in their room designs. At the same time, any modifications performed by the human user will feed back into MAP-Elites, closing a circular workflow of constant mutual inspiration. This paper presents the algorithm followed by an in-depth analysis of its behaviour, with the aims of evaluating the expressive range of all possible dimension combinations in several scenarios, as well as discussing their influence in the fitness landscape and in the overall performance of the mixed-initiative procedural content generation.
We investigate how reinforcement learning can be used to train level-designing agents. This represents a new approach to procedural content generation in games, where level design is framed as a game, and the content generator itself is learned. By seeing the design problem as a sequential task, we can use reinforcement learning to learn how to take the next action so that the expected final level quality is maximized. This approach can be used when few or no examples exist to train from, and the trained generator is very fast. We investigate three different ways of transforming two-dimensional level design problems into Markov decision processes and apply these to three game environments.
To procedurally create interactive content such as environments or game levels, we need agents that can evaluate the content; but to train such agents, we need content they can train on. Generative Playing Networks is a framework that learns agent policies and generates environments in tandem through a symbiotic process. Policies are learned using an actor-critic reinforcement learning algorithm so as to master the environment, and environments are created by a generator network which tries to provide an appropriate level of challenge for the agent. This is accomplished by the generator learning to make content based on estimates by the critic. Thus, this process provides an implicit curriculum for the agent, creating more complex environments over time. Unlike previous approaches to procedural content generation, Generative Playing Networks is end-to-end differentiable and does not require human-designed examples or domain knowledge. We demonstrate the capability of this framework by training an agent and level generator for a 2D dungeon crawler game.
This paper introduces a fully automatic method of mechanic illumination for general video game level generation. Using the Constrained MAP-Elites algorithm and the GVG-AI framework, this system generates the simplest tile based levels that contain specific sets of game mechanics and also satisfy playability constraints. We apply this method to illuminate mechanic space for $4$ different games in GVG-AI: Zelda, Solarfox, Plants, and RealPortals.
This paper presents a level generation method for Super Mario by stitching together pre-generated "scenes" that contain specific mechanics, using mechanic-sequences from agent playthroughs as input specifications. Given a sequence of mechanics, our system uses an FI-2Pop algorithm and a corpus of scenes to perform automated level authoring. The system outputs levels that have a similar mechanical sequence to the target mechanic sequence but with a different playthrough experience. We compare our system to a greedy method that selects scenes that maximize the target mechanics. Our system is able to maximize the number of matched mechanics while reducing emergent mechanics using the stitching process compared to the greedy approach.
Deep Reinforcement Learning (DRL) has shown impressive performance on domains with visual inputs, in particular various games. However, the agent is usually trained on a fixed environment, e.g. a fixed number of levels. A growing mass of evidence suggests that these trained models fail to generalize to even slight variations of the environments they were trained on. This paper advances the hypothesis that the lack of generalization is partly due to the input representation, and explores how rotation, cropping and translation could increase generality. We show that a cropped, translated and rotated observation can get better generalization on unseen levels of a two-dimensional arcade game. The generality of the agent is evaluated on a set of human-designed levels.
Quality Diversity (QD) algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are a new class of population-based stochastic algorithms designed to generate a diverse collection of quality solutions. Meanwhile, variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are among the best-performing derivative-free optimizers in single-objective continuous domains. This paper proposes a new QD algorithm called Covariance Matrix Adaptation MAP-Elites (CMA-ME). Our new algorithm combines the dynamic self-adaptation techniques of CMA-ES with archiving and mapping techniques for maintaining diversity in QD. Results from experiments with standard continuous optimization benchmarks show that CMA-ME finds better-quality solutions than MAP-Elites; similarly, results on the strategic game Hearthstone show that CMA-ME finds both a higher overall quality and broader diversity of strategies than both CMA-ES and MAP-Elites. Overall, CMA-ME more than doubles the performance of MAP-Elites using standard QD performance metrics. These results suggest that QD algorithms augmented by operators from state-of-the-art optimization algorithms can yield high-performing methods for simultaneously exploring and optimizing continuous search spaces, with significant applications to design, testing, and reinforcement learning among other domains. Code is available for both the continuous optimization benchmark (https://github.com/tehqin/QualDivBenchmark) and Hearthstone (https://github.com/tehqin/EvoStone) domains.
The idea behind procedural content generation (PCG) in games is to create content automatically, using algorithms, instead of relying on user-designed content. While PCG approaches have traditionally focused on creating content for video games, they are now being applied to all kinds of virtual environments, thereby enabling training of machine learning systems that are significantly more general. For example, PCG's ability to generate never-ending streams of new levels has allowed DeepMind's Capture the Flag agent to reach beyond human-level-performance. Additionally, PCG-inspired methods such as domain randomization enabled OpenAI's robot arm to learn to manipulate objects with unprecedented dexterity. Level generation in 2D arcade games has also illuminated some shortcomings of standard deep RL methods, suggesting potential ways to train more general policies. This Review looks at key aspect of PCG approaches, including its ability to (1) enable new video games (such as No Man's Sky), (2) create open-ended learning environments, (3) combat overfitting in supervised and reinforcement learning tasks, and (4) create better benchmarks that could ultimately spur the development of better learning algorithms. We hope this article can introduce the broader machine learning community to PCG, which we believe will be a critical tool in creating a more general machine intelligence.