Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Santiago Góngora

World-State Transformations for Neuro-symbolic Interactive Storytelling

May 23, 2026

Santiago Góngora, Luis Chiruzzo, Gonzalo Méndez, Pablo Gervás

Abstract:Large Language Models (LLMs) have changed the possibilities of Interactive Storytelling systems that process free-text user input. However, as more of these systems are built, evidence continues to mount regarding the story coherence problems that arise when relying solely on them. Recent research suggests that LLMs can effectively predict state changes within rule-based Interactive Storytelling systems, triggering pre-programmed world-state transformations. In this paper, we conduct an exploratory evaluation of whether such transformations can serve as a catalyst for player expression while aiming to address the incoherence issues typical of purely LLM-based approaches. Building upon a neuro-symbolic architecture, we conducted experiments using an open-source model (Llama 3 70B) and a closed-source model (Gemini 1.5 Flash), with testing conducted in both English and Spanish. Eight participants played two scenarios, carefully designed to assess different evaluation objectives. Our observations suggest that transformations offer a way to maintain world-state consistency while encouraging players to interact creatively through their written inputs.

* To be presented at the 17th International Conference on Computational Creativity (ICCC'26)

Via

Access Paper or Ask Questions

RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?

Jun 12, 2025

Santiago Góngora, Ignacio Sastre, Santiago Robaina, Ignacio Remersaro, Luis Chiruzzo, Aiala Rosá

Figure 1 for RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?

Figure 2 for RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?

Figure 3 for RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?

Figure 4 for RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?

Abstract:In this paper, we present the RETUYT-INCO participation at the BEA 2025 shared task. Our participation was characterized by the decision of using relatively small models, with fewer than 1B parameters. This self-imposed restriction tries to represent the conditions in which many research labs or institutions are in the Global South, where computational power is not easily accessible due to its prohibitive cost. Even under this restrictive self-imposed setting, our models managed to stay competitive with the rest of teams that participated in the shared task. According to the $exact\ F_1$ scores published by the organizers, the performance gaps between our models and the winners were as follows: $6.46$ in Track 1; $10.24$ in Track 2; $7.85$ in Track 3; $9.56$ in Track 4; and $13.13$ in Track 5. Considering that the minimum difference with a winner team is $6.46$ points -- and the maximum difference is $13.13$ -- according to the $exact\ F_1$ score, we find that models with a size smaller than 1B parameters are competitive for these tasks, all of which can be run on computers with a low-budget GPU or even without a GPU.

* This paper will be presented at the 20th BEA Workshop (Innovative Use of NLP for Building Educational Applications) at ACL 2025

Via

Access Paper or Ask Questions

A Platform for Generating Educational Activities to Teach English as a Second Language

Apr 28, 2025

Aiala Rosá, Santiago Góngora, Juan Pablo Filevich, Ignacio Sastre, Laura Musto, Brian Carpenter, Luis Chiruzzo

Figure 1 for A Platform for Generating Educational Activities to Teach English as a Second Language

Figure 2 for A Platform for Generating Educational Activities to Teach English as a Second Language

Figure 3 for A Platform for Generating Educational Activities to Teach English as a Second Language

Figure 4 for A Platform for Generating Educational Activities to Teach English as a Second Language

Abstract:We present a platform for the generation of educational activities oriented to teaching English as a foreign language. The different activities -- games and language practice exercises -- are strongly based on Natural Language Processing techniques. The platform offers the possibility of playing out-of-the-box games, generated from resources created semi-automatically and then manually curated. It can also generate games or exercises of greater complexity from texts entered by teachers, providing a stage of review and edition of the generated content before use. As a way of expanding the variety of activities in the platform, we are currently experimenting with image and text generation. In order to integrate them and improve the performance of other neural tools already integrated, we are working on migrating the platform to a more powerful server. In this paper we describe the development of our platform and its deployment for end users, discussing the challenges faced and how we overcame them, and also detail our future work plans.

* Unpublished report written in 2023

Via

Access Paper or Ask Questions

PAYADOR: A Minimalist Approach to Grounding Language Models on Structured Data for Interactive Storytelling and Role-playing Games

Apr 09, 2025

Santiago Góngora, Luis Chiruzzo, Gonzalo Méndez, Pablo Gervás

Abstract:Every time an Interactive Storytelling (IS) system gets a player input, it is facing the world-update problem. Classical approaches to this problem consist in mapping that input to known preprogrammed actions, what can severely constrain the free will of the player. When the expected experience has a strong focus on improvisation, like in Role-playing Games (RPGs), this problem is critical. In this paper we present PAYADOR, a different approach that focuses on predicting the outcomes of the actions instead of representing the actions themselves. To implement this approach, we ground a Large Language Model to a minimal representation of the fictional world, obtaining promising results. We make this contribution open-source, so it can be adapted and used for other related research on unleashing the co-creativity power of RPGs.

* Proceedings of the Fifteenth International Conference on Computational Creativity (2024) 101-106
* Presented at the 15th International Conference on Computational Creativity (ICCC'24)

Via

Access Paper or Ask Questions

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Jun 10, 2024

David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja(+65 more)

Figure 1 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Figure 2 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Figure 3 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Figure 4 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Abstract:Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 28 countries on four continents, covering 26 languages with 11 scripts, providing a total of 9k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.

Via

Access Paper or Ask Questions

Skill Check: Some Considerations on the Evaluation of Gamemastering Models for Role-playing Games

Sep 30, 2023

Santiago Góngora, Luis Chiruzzo, Gonzalo Méndez, Pablo Gervás

Figure 1 for Skill Check: Some Considerations on the Evaluation of Gamemastering Models for Role-playing Games

Figure 2 for Skill Check: Some Considerations on the Evaluation of Gamemastering Models for Role-playing Games

Figure 3 for Skill Check: Some Considerations on the Evaluation of Gamemastering Models for Role-playing Games

Figure 4 for Skill Check: Some Considerations on the Evaluation of Gamemastering Models for Role-playing Games

Abstract:In role-playing games a Game Master (GM) is the player in charge of the game, who must design the challenges the players face and narrate the outcomes of their actions. In this work we discuss some challenges to model GMs from an Interactive Storytelling and Natural Language Processing perspective. Following those challenges we propose three test categories to evaluate such dialogue systems, and we use them to test ChatGPT, Bard and OpenAssistant as out-of-the-box GMs.

* 11 pages. Accepted at GALA 2023 (Games and Learning Alliance 12th International Conference)

Via

Access Paper or Ask Questions

Overview of GUA-SPA at IberLEF 2023: Guarani-Spanish Code Switching Analysis

Sep 12, 2023

Luis Chiruzzo, Marvin Agüero-Torales, Gustavo Giménez-Lugo, Aldo Alvarez, Yliana Rodríguez, Santiago Góngora, Thamar Solorio

Figure 1 for Overview of GUA-SPA at IberLEF 2023: Guarani-Spanish Code Switching Analysis

Figure 2 for Overview of GUA-SPA at IberLEF 2023: Guarani-Spanish Code Switching Analysis

Figure 3 for Overview of GUA-SPA at IberLEF 2023: Guarani-Spanish Code Switching Analysis

Abstract:We present the first shared task for detecting and analyzing code-switching in Guarani and Spanish, GUA-SPA at IberLEF 2023. The challenge consisted of three tasks: identifying the language of a token, NER, and a novel task of classifying the way a Spanish span is used in the code-switched context. We annotated a corpus of 1500 texts extracted from news articles and tweets, around 25 thousand tokens, with the information for the tasks. Three teams took part in the evaluation phase, obtaining in general good results for Task 1, and more mixed results for Tasks 2 and 3.

* Procesamiento del Lenguaje Natural, Revista no. 71, septiembre de 2023, pp. 321-328

Via

Access Paper or Ask Questions