Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Humans vs Vision-Language Models: A Unified Measure of Narrative Coherence

Mar 26, 2026

Nikolai Ilinykh, Hyewon Jang, Shalom Lappin, Asad Sayeed, Sharid Loáiciga

Share this with someone who'll enjoy it:

Abstract:We study narrative coherence in visually grounded stories by comparing human-written narratives with those generated by vision-language models (VLMs) on the Visual Writing Prompts corpus. Using a set of metrics that capture different aspects of narrative coherence, including coreference, discourse relation types, topic continuity, character persistence, and multimodal character grounding, we compute a narrative coherence score. We find that VLMs show broadly similar coherence profiles that differ systematically from those of humans. In addition, differences for individual measures are often subtle, but they become clearer when considered jointly. Overall, our results indicate that, despite human-like surface fluency, model narratives exhibit systematic differences from those of humans in how they organise discourse across a visually grounded story. Our code is available at https://github.com/GU-CLASP/coherence-driven-humans.

* 9 pages of content, 1 page of appendices, 9 tables, 3 figures

View paper on

Share this with someone who'll enjoy it:

Title:Humans vs Vision-Language Models: A Unified Measure of Narrative Coherence

Paper and Code