Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Simeon Junker

SceneGram: Conceptualizing and Describing Tangrams in Scene Context

Jun 13, 2025

Simeon Junker, Sina Zarrieß

Figure 1 for SceneGram: Conceptualizing and Describing Tangrams in Scene Context

Figure 2 for SceneGram: Conceptualizing and Describing Tangrams in Scene Context

Figure 3 for SceneGram: Conceptualizing and Describing Tangrams in Scene Context

Figure 4 for SceneGram: Conceptualizing and Describing Tangrams in Scene Context

Abstract:Research on reference and naming suggests that humans can come up with very different ways of conceptualizing and referring to the same object, e.g. the same abstract tangram shape can be a "crab", "sink" or "space ship". Another common assumption in cognitive science is that scene context fundamentally shapes our visual perception of objects and conceptual expectations. This paper contributes SceneGram, a dataset of human references to tangram shapes placed in different scene contexts, allowing for systematic analyses of the effect of scene context on conceptualization. Based on this data, we analyze references to tangram shapes generated by multimodal LLMs, showing that these models do not account for the richness and variability of conceptualizations found in human references.

* To appear in ACL Findings 2025

Via

Access Paper or Ask Questions

Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?

Jun 13, 2025

Simeon Junker, Manar Ali, Larissa Koch, Sina Zarrieß, Hendrik Buschmeier

Abstract:We investigate the linguistic abilities of multimodal large language models in reference resolution tasks featuring simple yet abstract visual stimuli, such as color patches and color grids. Although the task may not seem challenging for today's language models, being straightforward for human dyads, we consider it to be a highly relevant probe of the pragmatic capabilities of MLLMs. Our results and analyses indeed suggest that basic pragmatic capabilities, such as context-dependent interpretation of color descriptions, still constitute major challenges for state-of-the-art MLLMs.

* To appear in ACL Findings 2025

Via

Access Paper or Ask Questions

The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems

Jun 27, 2024

Judith Sieker, Simeon Junker, Ronja Utescher, Nazia Attari, Heiko Wersing, Hendrik Buschmeier, Sina Zarrieß

Figure 1 for The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems

Figure 2 for The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems

Figure 3 for The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems

Figure 4 for The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems

Abstract:We examine how users perceive the limitations of an AI system when it encounters a task that it cannot perform perfectly and whether providing explanations alongside its answers aids users in constructing an appropriate mental model of the system's capabilities and limitations. We employ a visual question answer and explanation task where we control the AI system's limitations by manipulating the visual inputs: during inference, the system either processes full-color or grayscale images. Our goal is to determine whether participants can perceive the limitations of the system. We hypothesize that explanations will make limited AI capabilities more transparent to users. However, our results show that explanations do not have this effect. Instead of allowing users to more accurately assess the limitations of the AI system, explanations generally increase users' perceptions of the system's competence - regardless of its actual performance.

* 16 pages (including Appendix); under review

Via

Access Paper or Ask Questions

Resilience through Scene Context in Visual Referring Expression Generation

Apr 18, 2024

Simeon Junker, Sina Zarrieß

Figure 1 for Resilience through Scene Context in Visual Referring Expression Generation

Figure 2 for Resilience through Scene Context in Visual Referring Expression Generation

Figure 3 for Resilience through Scene Context in Visual Referring Expression Generation

Figure 4 for Resilience through Scene Context in Visual Referring Expression Generation

Abstract:Scene context is well known to facilitate humans' perception of visible objects. In this paper, we investigate the role of context in Referring Expression Generation (REG) for objects in images, where existing research has often focused on distractor contexts that exert pressure on the generator. We take a new perspective on scene context in REG and hypothesize that contextual information can be conceived of as a resource that makes REG models more resilient and facilitates the generation of object descriptions, and object types in particular. We train and test Transformer-based REG models with target representations that have been artificially obscured with noise to varying degrees. We evaluate how properties of the models' visual context affect their processing and performance. Our results show that even simple scene contexts make models surprisingly resilient to perturbations, to the extent that they can identify referent types even when visual information about the target is completely missing.

Via

Access Paper or Ask Questions