Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sam Earle

Video Game Level Design as a Multi-Agent Reinforcement Learning Problem

Oct 06, 2025

Sam Earle, Zehua Jiang, Eugene Vinitsky, Julian Togelius

Abstract:Procedural Content Generation via Reinforcement Learning (PCGRL) offers a method for training controllable level designer agents without the need for human datasets, using metrics that serve as proxies for level quality as rewards. Existing PCGRL research focuses on single generator agents, but are bottlenecked by the need to frequently recalculate heuristics of level quality and the agent's need to navigate around potentially large maps. By framing level generation as a multi-agent problem, we mitigate the efficiency bottleneck of single-agent PCGRL by reducing the number of reward calculations relative to the number of agent actions. We also find that multi-agent level generators are better able to generalize to out-of-distribution map shapes, which we argue is due to the generators' learning more local, modular design policies. We conclude that treating content generation as a distributed, multi-agent task is beneficial for generating functional artifacts at scale.

* 11 pages, 7 tables, 5 figures, published as full technical paper at the AAAI conference on Artificial Intelligence and Interactive Digital Entertainment 2025

Via

Access Paper or Ask Questions

ScriptDoctor: Automatic Generation of PuzzleScript Games via Large Language Models and Tree Search

Jun 06, 2025

Sam Earle, Ahmed Khalifa, Muhammad Umair Nasir, Zehua Jiang, Graham Todd, Andrzej Banburski-Fahey, Julian Togelius

Figure 1 for ScriptDoctor: Automatic Generation of PuzzleScript Games via Large Language Models and Tree Search

Figure 2 for ScriptDoctor: Automatic Generation of PuzzleScript Games via Large Language Models and Tree Search

Figure 3 for ScriptDoctor: Automatic Generation of PuzzleScript Games via Large Language Models and Tree Search

Figure 4 for ScriptDoctor: Automatic Generation of PuzzleScript Games via Large Language Models and Tree Search

Abstract:There is much interest in using large pre-trained models in Automatic Game Design (AGD), whether via the generation of code, assets, or more abstract conceptualization of design ideas. But so far this interest largely stems from the ad hoc use of such generative models under persistent human supervision. Much work remains to show how these tools can be integrated into longer-time-horizon AGD pipelines, in which systems interface with game engines to test generated content autonomously. To this end, we introduce ScriptDoctor, a Large Language Model (LLM)-driven system for automatically generating and testing games in PuzzleScript, an expressive but highly constrained description language for turn-based puzzle games over 2D gridworlds. ScriptDoctor generates and tests game design ideas in an iterative loop, where human-authored examples are used to ground the system's output, compilation errors from the PuzzleScript engine are used to elicit functional code, and search-based agents play-test generated games. ScriptDoctor serves as a concrete example of the potential of automated, open-ended LLM-based workflows in generating novel game content.

* 5 pages, 3 figures, 3 tables, submitted to IEEE Conference on Games as a Short Paper

Via

Access Paper or Ask Questions

Enhancing Player Enjoyment with a Two-Tier DRL and LLM-Based Agent System for Fighting Games

Apr 10, 2025

Shouren Wang, Zehua Jiang, Fernando Sliva, Sam Earle, Julian Togelius

Abstract:Deep reinforcement learning (DRL) has effectively enhanced gameplay experiences and game design across various game genres. However, few studies on fighting game agents have focused explicitly on enhancing player enjoyment, a critical factor for both developers and players. To address this gap and establish a practical baseline for designing enjoyability-focused agents, we propose a two-tier agent (TTA) system and conducted experiments in the classic fighting game Street Fighter II. The first tier of TTA employs a task-oriented network architecture, modularized reward functions, and hybrid training to produce diverse and skilled DRL agents. In the second tier of TTA, a Large Language Model Hyper-Agent, leveraging players' playing data and feedback, dynamically selects suitable DRL opponents. In addition, we investigate and model several key factors that affect the enjoyability of the opponent. The experiments demonstrate improvements from 64. 36% to 156. 36% in the execution of advanced skills over baseline methods. The trained agents also exhibit distinct game-playing styles. Additionally, we conducted a small-scale user study, and the overall enjoyment in the player's feedback validates the effectiveness of our TTA system.

* 15 pages, 8 figures. Submitted to a peer-reviewed conference, under review

Via

Access Paper or Ask Questions

Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments

Feb 08, 2025

M Charity, Mayu Wilson, Steven Lee, Dipika Rajesh, Sam Earle, Julian Togelius

Figure 1 for Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments

Figure 2 for Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments

Figure 3 for Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments

Figure 4 for Amorphous Fortress Online: Collaboratively Designing Open-Ended Multi-Agent AI and Game Environments

Abstract:This work introduces Amorphous Fortress Online -- a web-based platform where users can design petri-dish-like environments and games consisting of multi-agent AI characters. Users can play, create, and share artificial life and game environments made up of microscopic but transparent finite-state machine agents that interact with each other. The website features multiple interactive editors and accessible settings to view the multi-agent interactions directly from the browser. This system serves to provide a database of thematically diverse AI and game environments that use the emergent behaviors of simple AI agents.

Via

Access Paper or Ask Questions

DreamGarden: A Designer Assistant for Growing Games from a Single Prompt

Oct 02, 2024

Sam Earle, Samyak Parajuli, Andrzej Banburski-Fahey

Figure 1 for DreamGarden: A Designer Assistant for Growing Games from a Single Prompt

Figure 2 for DreamGarden: A Designer Assistant for Growing Games from a Single Prompt

Figure 3 for DreamGarden: A Designer Assistant for Growing Games from a Single Prompt

Figure 4 for DreamGarden: A Designer Assistant for Growing Games from a Single Prompt

Abstract:Coding assistants are increasingly leveraged in game design, both generating code and making high-level plans. To what degree can these tools align with developer workflows, and what new modes of human-computer interaction can emerge from their use? We present DreamGarden, an AI system capable of assisting with the development of diverse game environments in Unreal Engine. At the core of our method is an LLM-driven planner, capable of breaking down a single, high-level prompt -- a dream, memory, or imagined scenario provided by a human user -- into a hierarchical action plan, which is then distributed across specialized submodules facilitating concrete implementation. This system is presented to the user as a garden of plans and actions, both growing independently and responding to user intervention via seed prompts, pruning, and feedback. Through a user study, we explore design implications of this system, charting courses for future work in semi-autonomous assistants and open-ended simulation design.

* 21 pages + appendix, 11 figures

Via

Access Paper or Ask Questions

PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Aug 22, 2024

Sam Earle, Zehua Jiang, Julian Togelius

Figure 1 for PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Figure 2 for PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Figure 3 for PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Figure 4 for PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

Abstract:Procedural Content Generation via Reinforcement Learning (PCGRL) has been introduced as a means by which controllable designer agents can be trained based only on a set of computable metrics acting as a proxy for the level's quality and key characteristics. While PCGRL offers a unique set of affordances for game designers, it is constrained by the compute-intensive process of training RL agents, and has so far been limited to generating relatively small levels. To address this issue of scale, we implement several PCGRL environments in Jax so that all aspects of learning and simulation happen in parallel on the GPU, resulting in faster environment simulation; removing the CPU-GPU transfer of information bottleneck during RL training; and ultimately resulting in significantly improved training speed. We replicate several key results from prior works in this new framework, letting models train for much longer than previously studied, and evaluating their behavior after 1 billion timesteps. Aiming for greater control for human designers, we introduce randomized level sizes and frozen "pinpoints" of pivotal game tiles as further ways of countering overfitting. To test the generalization ability of learned generators, we evaluate models on large, out-of-distribution map sizes, and find that partial observation sizes learn more robust design strategies.

* 8 pages, 7 figures, 6 tables. Published at IEEE Conference on Games, 2024

Via

Access Paper or Ask Questions

Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game

Jul 15, 2024

Tim Merino, Sam Earle, Ryan Sudhakaran, Shyam Sudhakaran, Julian Togelius

Figure 1 for Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game

Figure 2 for Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game

Figure 3 for Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game

Figure 4 for Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game

Abstract:The Connections puzzle is a word association game published daily by The New York Times (NYT). In this game, players are asked to find groups of four words that are connected by a common theme. While solving a given Connections puzzle requires both semantic knowledge and abstract reasoning, generating novel puzzles additionally requires a form of metacognition: generators must be able to accurately model the downstream reasoning of potential solvers. In this paper, we investigate the ability of the GPT family of Large Language Models (LLMs) to generate challenging and creative word games for human players. We start with an analysis of the word game Connections and the unique challenges it poses as a Procedural Content Generation (PCG) domain. We then propose a method for generating Connections puzzles using LLMs by adapting a Tree of Thoughts (ToT) prompting approach. We evaluate this method by conducting a user study, asking human players to compare AI-generated puzzles against published Connections puzzles. Our findings show that LLMs are capable puzzle creators, and can generate diverse sets of enjoyable, challenging, and creative Connections puzzles as judged by human users.

Via

Access Paper or Ask Questions

Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents

Jul 05, 2024

Sam Earle, Julian Togelius

Figure 1 for Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents

Figure 2 for Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents

Figure 3 for Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents

Figure 4 for Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents

Abstract:We introduce Autoverse, an evolvable, domain-specific language for single-player 2D grid-based games, and demonstrate its use as a scalable training ground for Open-Ended Learning (OEL) algorithms. Autoverse uses cellular-automaton-like rewrite rules to describe game mechanics, allowing it to express various game environments (e.g. mazes, dungeons, sokoban puzzles) that are popular testbeds for Reinforcement Learning (RL) agents. Each rewrite rule can be expressed as a series of simple convolutions, allowing for environments to be parallelized on the GPU, thereby drastically accelerating RL training. Using Autoverse, we propose jump-starting open-ended learning by imitation learning from search. In such an approach, we first evolve Autoverse environments (their rules and initial map topology) to maximize the number of iterations required by greedy tree search to discover a new best solution, producing a curriculum of increasingly complex environments and playtraces. We then distill these expert playtraces into a neural-network-based policy using imitation learning. Finally, we use the learned policy as a starting point for open-ended RL, where new training environments are continually evolved to maximize the RL player agent's value function error (a proxy for its regret, or the learnability of generated environments), finding that this approach improves the performance and generality of resultant player agents.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft

Apr 23, 2024

Sam Earle, Filippos Kokkinos, Yuhe Nie, Julian Togelius, Roberta Raileanu

Abstract:Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast, text-to-3D methods allow users to specify desired characteristics in natural language, offering a high amount of flexibility and expressivity. But unlike PCG, such approaches cannot guarantee functionality, which is crucial for certain applications like game design. In this paper, we present a method for generating functional 3D artifacts from free-form text prompts in the open-world game Minecraft. Our method, DreamCraft, trains quantized Neural Radiance Fields (NeRFs) to represent artifacts that, when viewed in-game, match given text descriptions. We find that DreamCraft produces more aligned in-game artifacts than a baseline that post-processes the output of an unconstrained NeRF. Thanks to the quantized representation of the environment, functional constraints can be integrated using specialized loss terms. We show how this can be leveraged to generate 3D structures that match a target distribution or obey certain adjacency rules over the block types. DreamCraft inherits a high degree of expressivity and controllability from the NeRF, while still being able to incorporate functional constraints through domain-specific objectives.

* 16 pages, 9 figures, accepted to Foundation of Digital Games 2024

Via

Access Paper or Ask Questions

Missed Connections: Lateral Thinking Puzzles for Large Language Models

Apr 17, 2024

Graham Todd, Tim Merino, Sam Earle, Julian Togelius

Figure 1 for Missed Connections: Lateral Thinking Puzzles for Large Language Models

Figure 2 for Missed Connections: Lateral Thinking Puzzles for Large Language Models

Figure 3 for Missed Connections: Lateral Thinking Puzzles for Large Language Models

Figure 4 for Missed Connections: Lateral Thinking Puzzles for Large Language Models

Abstract:The Connections puzzle published each day by the New York Times tasks players with dividing a bank of sixteen words into four groups of four words that each relate to a common theme. Solving the puzzle requires both common linguistic knowledge (i.e. definitions and typical usage) as well as, in many cases, lateral or abstract thinking. This is because the four categories ascend in complexity, with the most challenging category often requiring thinking about words in uncommon ways or as parts of larger phrases. We investigate the capacity for automated AI systems to play Connections and explore the game's potential as an automated benchmark for abstract reasoning and a way to measure the semantic information encoded by data-driven linguistic systems. In particular, we study both a sentence-embedding baseline and modern large language models (LLMs). We report their accuracy on the task, measure the impacts of chain-of-thought prompting, and discuss their failure modes. Overall, we find that the Connections task is challenging yet feasible, and a strong test-bed for future work.

* 8 pages, 3 figures

Via

Access Paper or Ask Questions