Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Katrin Erk

Scene Abstraction for Lexical Semantics: Structured Representations of Situated Meaning

May 21, 2026

Yejin Cho, Katrin Erk

Abstract:Coffee and tea share many properties, yet they evoke strikingly different situations, atmospheres, and affective associations. These situated dimensions of word meaning are real and systematic, but they remain implicit in most computational representations of lexical meaning. We propose Scene Abstraction, a framework for constructing structured representations of the interpretive scenes that words participate in across usage contexts. Each scene consists of a Contextual Scene (Events, Entities, Setting) and an expression-centered Expression Profile (Engaged events, Generalizable properties, Evoked emotions), operationalized through few-shot prompting of a large language model. Our contributions are three-fold: (1) a structured representation framework for situated lexical meaning; (2) COCA-Scenes, a dataset of 520 usage instances across 26 keywords for distinct scene identification; and (3) empirical evidence from two experiments suggesting that scenes are reliably identifiable across human observers (82.4% accuracy, +11.8 pp over text-only embeddings) and that our scene profiles more closely align with human interpretation of words in context than ATOMIC-based alternatives (86.4% preference across three semantic dimensions).

Via

Access Paper or Ask Questions

Uncertainty-Aware Web-Conditioned Scientific Fact-Checking

Apr 13, 2026

Ashwin Vinod, Katrin Erk

Abstract:Scientific fact-checking is vital for assessing claims in specialized domains such as biomedicine and materials science, yet existing systems often hallucinate or apply inconsistent reasoning, especially when verifying technical, compositional claims against an evidence snippet under source and cost/latency constraints. We present a pipeline centered on atomic predicate-argument decomposition and calibrated, uncertainty-gated corroboration: atomic facts are aligned to local snippets via embeddings, verified by a compact evidence-grounded checker, and only facts with uncertain support trigger domain-restricted web search over authoritative sources. The system supports both binary and tri-valued classification where it predicts labels from Supported, Refuted, NEI for three-way tasks. We evaluate under two regimes, Context-Only (no web) and Context+Web (uncertainty-gated web corroboration); when retrieved evidence conflicts with the provided context, we abstain with NEI rather than overriding the context. On multiple benchmarks, our framework surpasses the strongest benchmarks. In our experiments, web corroboration was invoked for only a minority of atomic facts on average, indicating that external evidence is consulted selectively under calibrated uncertainty rather than routinely. Overall, coupling atomic granularity with calibrated, uncertainty-gated corroboration yields more interpretable and context-conditioned verification, making the approach well-suited to high-stakes, single-document settings that demand traceable rationales, predictable cost/latency, and conservative.

Via

Access Paper or Ask Questions

RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models

Apr 15, 2025

Juan Diego Rodriguez, Wenxuan Ding, Katrin Erk, Greg Durrett

Figure 1 for RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models

Figure 2 for RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models

Figure 3 for RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models

Figure 4 for RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models

Abstract:Although large language models (LLMs) have become generally more capable and accurate across many tasks, some fundamental sources of unreliability remain in their behavior. One key limitation is their inconsistency at reporting the the same information when prompts are changed. In this paper, we consider the discrepancy between a model's generated answer and their own verification of that answer, the generator-validator gap. We define this gap in a more stringent way than prior work: we expect correlation of scores from a generator and a validator over the entire set of candidate answers. We show that according to this measure, a large gap exists in various settings, including question answering, lexical semantics tasks, and next-word prediction. We then propose RankAlign, a ranking-based training method, and show that it significantly closes the gap by 31.8% on average, surpassing all baseline methods. Moreover, this approach generalizes well to out-of-domain tasks and lexical items.

Via

Access Paper or Ask Questions

The Emergence of Grammar through Reinforcement Learning

Mar 03, 2025

Stephen Wechsler, James W. Shearer, Katrin Erk

Abstract:The evolution of grammatical systems of syntactic and semantic composition is modeled here with a novel application of reinforcement learning theory. To test the functionalist thesis that speakers' expressive purposes shape their language, we include within the model a probability distribution over different messages that could be expressed in a given context. The proposed learning and production algorithm then breaks down language learning into a sequence of simple steps, such that each step benefits from the message probabilities. The results are presented in the form of numerical simulations of language histories and analytic proofs. The potential for applying these mathematical models to the study of natural language is illustrated with two case studies from the history of English.

* 49 pages, 8 figures

Via

Access Paper or Ask Questions

SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events

Aug 11, 2024

Sai Vallurupalli, Katrin Erk, Francis Ferraro

Figure 1 for SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events

Figure 2 for SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events

Figure 3 for SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events

Figure 4 for SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events

Abstract:Interpreting and assessing goal driven actions is vital to understanding and reasoning over complex events. It is important to be able to acquire the knowledge needed for this understanding, though doing so is challenging. We argue that such knowledge can be elicited through a participant achievement lens. We analyze a complex event in a narrative according to the intended achievements of the participants in that narrative, the likely future actions of the participants, and the likelihood of goal success. We collect 6.3K high quality goal and action annotations reflecting our proposed participant achievement lens, with an average weighted Fleiss-Kappa IAA of 80%. Our collection contains annotated alternate versions of each narrative. These alternate versions vary minimally from the "original" story, but can license drastically different inferences. Our findings suggest that while modern large language models can reflect some of the goal-based knowledge we study, they find it challenging to fully capture the design and intent behind concerted actions, even when the model pretraining included the data from which we extracted the goal knowledge. We show that smaller models fine-tuned on our dataset can achieve performance surpassing larger models.

* Accepted to Findings of the Association for Computational Linguistics 2024

Via

Access Paper or Ask Questions

Adjusting Interpretable Dimensions in Embedding Space with Human Judgments

Apr 03, 2024

Katrin Erk, Marianna Apidianaki

Abstract:Embedding spaces contain interpretable dimensions indicating gender, formality in style, or even object properties. This has been observed multiple times. Such interpretable dimensions are becoming valuable tools in different areas of study, from social science to neuroscience. The standard way to compute these dimensions uses contrasting seed words and computes difference vectors over them. This is simple but does not always work well. We combine seed-based vectors with guidance from human ratings of where words fall along a specific dimension, and evaluate on predicting both object properties like size and danger, and the stylistic properties of formality and complexity. We obtain interpretable dimensions with markedly better performance especially in cases where seed-based dimensions do not work well.

* NAACL 2024

Via

Access Paper or Ask Questions

X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs

Sep 16, 2023

Juan Diego Rodriguez, Katrin Erk, Greg Durrett

Figure 1 for X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs

Figure 2 for X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs

Figure 3 for X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs

Figure 4 for X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs

Abstract:Understanding when two pieces of text convey the same information is a goal touching many subproblems in NLP, including textual entailment and fact-checking. This problem becomes more complex when those two pieces of text are in different languages. Here, we introduce X-PARADE (Cross-lingual Paragraph-level Analysis of Divergences and Entailments), the first cross-lingual dataset of paragraph-level information divergences. Annotators label a paragraph in a target language at the span level and evaluate it with respect to a corresponding paragraph in a source language, indicating whether a given piece of information is the same, new, or new but can be inferred. This last notion establishes a link with cross-language NLI. Aligned paragraphs are sourced from Wikipedia pages in different languages, reflecting real information divergences observed in the wild. Armed with our dataset, we investigate a diverse set of approaches for this problem, including classic token alignment from machine translation, textual entailment methods that localize their decisions, and prompting of large language models. Our results show that these methods vary in their capability to handle inferable information, but they all fall short of human performance.

Via

Access Paper or Ask Questions

A Method for Studying Semantic Construal in Grammatical Constructions with Interpretable Contextual Embedding Spaces

May 29, 2023

Gabriella Chronis, Kyle Mahowald, Katrin Erk

Figure 1 for A Method for Studying Semantic Construal in Grammatical Constructions with Interpretable Contextual Embedding Spaces

Figure 2 for A Method for Studying Semantic Construal in Grammatical Constructions with Interpretable Contextual Embedding Spaces

Figure 3 for A Method for Studying Semantic Construal in Grammatical Constructions with Interpretable Contextual Embedding Spaces

Figure 4 for A Method for Studying Semantic Construal in Grammatical Constructions with Interpretable Contextual Embedding Spaces

Abstract:We study semantic construal in grammatical constructions using large language models. First, we project contextual word embeddings into three interpretable semantic spaces, each defined by a different set of psycholinguistic feature norms. We validate these interpretable spaces and then use them to automatically derive semantic characterizations of lexical items in two grammatical constructions: nouns in subject or object position within the same sentence, and the AANN construction (e.g., `a beautiful three days'). We show that a word in subject position is interpreted as more agentive than the very same word in object position, and that the nouns in the AANN construction are interpreted as more measurement-like than when in the canonical alternation. Our method can probe the distributional meaning of syntactic constructions at a templatic level, abstracted away from specific lexemes.

Via

Access Paper or Ask Questions

POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events

Dec 05, 2022

Sai Vallurupalli, Sayontan Ghosh, Katrin Erk, Niranjan Balasubramanian, Francis Ferraro

Figure 1 for POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events

Figure 2 for POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events

Figure 3 for POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events

Figure 4 for POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events

Abstract:Knowledge about outcomes is critical for complex event understanding but is hard to acquire. We show that by pre-identifying a participant in a complex event, crowd workers are able to (1) infer the collective impact of salient events that make up the situation, (2) annotate the volitional engagement of participants in causing the situation, and (3) ground the outcome of the situation in state changes of the participants. By creating a multi-step interface and a careful quality control strategy, we collect a high quality annotated dataset of 8K short newswire narratives and ROCStories with high inter-annotator agreement (0.74-0.96 weighted Fleiss Kappa). Our dataset, POQue (Participant Outcome Questions), enables the exploration and development of models that address multiple aspects of semantic understanding. Experimentally, we show that current language models lag behind human performance in subtle ways through our task formulations that target abstract and specific comprehension of a complex event, its outcome, and a participant's influence over the event culmination.

* Accepted to EMNLP 2022 main conference as a long paper

Via

Access Paper or Ask Questions

longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks

Jun 29, 2022

Venelin Kovatchev, Trina Chatterjee, Venkata S Govindarajan, Jifan Chen, Eunsol Choi, Gabriella Chronis, Anubrata Das, Katrin Erk, Matthew Lease, Junyi Jessy Li(+2 more)

Figure 1 for longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks

Figure 2 for longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks

Figure 3 for longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks

Figure 4 for longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks

Abstract:Developing methods to adversarially challenge NLP systems is a promising avenue for improving both model performance and interpretability. Here, we describe the approach of the team "longhorns" on Task 1 of the The First Workshop on Dynamic Adversarial Data Collection (DADC), which asked teams to manually fool a model on an Extractive Question Answering task. Our team finished first, with a model error rate of 62%. We advocate for a systematic, linguistically informed approach to formulating adversarial questions, and we describe the results of our pilot experiments, as well as our official submission.

* Accepted at DADC2022

Via

Access Paper or Ask Questions