Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

So Young Lee

Correct-Detect: Balancing Performance and Ambiguity Through the Lens of Coreference Resolution in LLMs

Sep 17, 2025

Amber Shore, Russell Scheinberg, Ameeta Agrawal, So Young Lee

Abstract:Large Language Models (LLMs) are intended to reflect human linguistic competencies. But humans have access to a broad and embodied context, which is key in detecting and resolving linguistic ambiguities, even in isolated text spans. A foundational case of semantic ambiguity is found in the task of coreference resolution: how is a pronoun related to an earlier person mention? This capability is implicit in nearly every downstream task, and the presence of ambiguity at this level can alter performance significantly. We show that LLMs can achieve good performance with minimal prompting in both coreference disambiguation and the detection of ambiguity in coreference, however, they cannot do both at the same time. We present the CORRECT-DETECT trade-off: though models have both capabilities and deploy them implicitly, successful performance balancing these two abilities remains elusive.

Via

Access Paper or Ask Questions

Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?

Mar 13, 2025

So Young Lee, Russell Scheinberg, Amber Shore, Ameeta Agrawal

Abstract:This study explores how recent large language models (LLMs) navigate relative clause attachment {ambiguity} and use world knowledge biases for disambiguation in six typologically diverse languages: English, Chinese, Japanese, Korean, Russian, and Spanish. We describe the process of creating a novel dataset -- MultiWho -- for fine-grained evaluation of relative clause attachment preferences in ambiguous and unambiguous contexts. Our experiments with three LLMs indicate that, contrary to humans, LLMs consistently exhibit a preference for local attachment, displaying limited responsiveness to syntactic variations or language-specific attachment patterns. Although LLMs performed well in unambiguous cases, they rigidly prioritized world knowledge biases, lacking the flexibility of human language processing. These findings highlight the need for more diverse, pragmatically nuanced multilingual training to improve LLMs' handling of complex structures and human-like comprehension.

* NAACL 2025

Via

Access Paper or Ask Questions

Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models

Mar 04, 2025

So Young Lee, Russell Scheinberg, Amber Shore, Ameeta Agrawal

Figure 1 for Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models

Figure 2 for Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models

Figure 3 for Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models

Figure 4 for Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models

Abstract:This study examines how large language models (LLMs) resolve relative clause (RC) attachment ambiguities and compares their performance to human sentence processing. Focusing on two linguistic factors, namely the length of RCs and the syntactic position of complex determiner phrases (DPs), we assess whether LLMs can achieve human-like interpretations amid the complexities of language. In this study, we evaluated several LLMs, including Claude, Gemini and Llama, in multiple languages: English, Spanish, French, German, Japanese, and Korean. While these models performed well in Indo-European languages (English, Spanish, French, and German), they encountered difficulties in Asian languages (Japanese and Korean), often defaulting to incorrect English translations. The findings underscore the variability in LLMs' handling of linguistic ambiguities and highlight the need for model improvements, particularly for non-European languages. This research informs future enhancements in LLM design to improve accuracy and human-like processing in diverse linguistic environments.

* Accepted at PACLIC 2024

Via

Access Paper or Ask Questions