Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rick Nouwen

Can LLMs Detect Ambiguous Plural Reference? An Analysis of Split-Antecedent and Mereological Reference

Oct 06, 2025

Dang Anh, Rick Nouwen, Massimo Poesio

Abstract:Our goal is to study how LLMs represent and interpret plural reference in ambiguous and unambiguous contexts. We ask the following research questions: (1) Do LLMs exhibit human-like preferences in representing plural reference? (2) Are LLMs able to detect ambiguity in plural anaphoric expressions and identify possible referents? To address these questions, we design a set of experiments, examining pronoun production using next-token prediction tasks, pronoun interpretation, and ambiguity detection using different prompting strategies. We then assess how comparable LLMs are to humans in formulating and interpreting plural reference. We find that LLMs are sometimes aware of possible referents of ambiguous pronouns. However, they do not always follow human reference when choosing between interpretations, especially when the possible interpretation is not explicitly mentioned. In addition, they struggle to identify ambiguity without direct instruction. Our findings also reveal inconsistencies in the results across different types of experiments.

Via

Access Paper or Ask Questions

VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Feb 18, 2025

Hugh Mee Wong, Rick Nouwen, Albert Gatt

Figure 1 for VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Figure 2 for VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Figure 3 for VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Figure 4 for VAQUUM: Are Vague Quantifiers Grounded in Visual Data?

Abstract:Vague quantifiers such as "a few" and "many" are influenced by many contextual factors, including how many objects are present in a given context. In this work, we evaluate the extent to which vision-and-language models (VLMs) are compatible with humans when producing or judging the appropriateness of vague quantifiers in visual contexts. We release a novel dataset, VAQUUM, containing 20300 human ratings on quantified statements across a total of 1089 images. Using this dataset, we compare human judgments and VLM predictions using three different evaluation methods. Our findings show that VLMs, like humans, are influenced by object counts in vague quantifier use. However, we find significant inconsistencies across models in different evaluation settings, suggesting that judging and producing vague quantifiers rely on two different processes.

* Under review, 12 pages for main paper (5 figures), 15 pages including appendix (2 figures)

Via

Access Paper or Ask Questions