Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hongling Sheng

Facilitating Video Story Interaction with Multi-Agent Collaborative System

May 02, 2025

Yiwen Zhang, Jianing Hao, Zhan Wang, Hongling Sheng, Wei Zeng

Figure 1 for Facilitating Video Story Interaction with Multi-Agent Collaborative System

Figure 2 for Facilitating Video Story Interaction with Multi-Agent Collaborative System

Figure 3 for Facilitating Video Story Interaction with Multi-Agent Collaborative System

Figure 4 for Facilitating Video Story Interaction with Multi-Agent Collaborative System

Abstract:Video story interaction enables viewers to engage with and explore narrative content for personalized experiences. However, existing methods are limited to user selection, specially designed narratives, and lack customization. To address this, we propose an interactive system based on user intent. Our system uses a Vision Language Model (VLM) to enable machines to understand video stories, combining Retrieval-Augmented Generation (RAG) and a Multi-Agent System (MAS) to create evolving characters and scene experiences. It includes three stages: 1) Video story processing, utilizing VLM and prior knowledge to simulate human understanding of stories across three modalities. 2) Multi-space chat, creating growth-oriented characters through MAS interactions based on user queries and story stages. 3) Scene customization, expanding and visualizing various story scenes mentioned in dialogue. Applied to the Harry Potter series, our study shows the system effectively portrays emergent character social behavior and growth, enhancing the interactive experience in the video story world.

* Prepared and submitted in 2024

Via

Access Paper or Ask Questions