Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Victor De Marez

Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness

Jun 04, 2026

Victor De Marez, Luna De Bruyne, Walter Daelemans

Abstract:Factual sycophancy occurs when a language model abandons a correct, verifiable answer under social pressure. Because a flip occurs only when pressure toward a false answer exceeds the model's neutral preference for the truth, flip rates conflate two mechanisms: the strength of that baseline preference (truth margin), and how far pressure shifts it (manipulation sensitivity). We decompose factual sycophancy into these channels and use them to separate the effects of size and instruction tuning across 56 open-weight models spanning 0.3B-32B parameters and 13 manipulation types. We find that vulnerability is governed mainly by size, but instruction tuning changes how size acts: small instruction-tuned models can become less robust, whereas large instruction-tuned models usually become more robust. Instruction tuning primarily increases truth margin, but its behavioral effect depends on manipulation type. Scaling also changes the two channels differently: base models gain margin but become mildly more manipulation-sensitive, whereas instruction-tuned models gain margin faster and become less sensitive. Factual sycophancy is therefore not a single scalar property. Evaluations should report channel-specific, manipulation-specific, and size-conditioned robustness rather than flip rates alone.

Via

Access Paper or Ask Questions

An Agentic AI Framework for Training General Practitioner Student Skills

Dec 20, 2025

Victor De Marez, Jens Van Nooten, Luna De Bruyne, Walter Daelemans

Figure 1 for An Agentic AI Framework for Training General Practitioner Student Skills

Figure 2 for An Agentic AI Framework for Training General Practitioner Student Skills

Figure 3 for An Agentic AI Framework for Training General Practitioner Student Skills

Figure 4 for An Agentic AI Framework for Training General Practitioner Student Skills

Abstract:Advancements in large language models offer strong potential for enhancing virtual simulated patients (VSPs) in medical education by providing scalable alternatives to resource-intensive traditional methods. However, current VSPs often struggle with medical accuracy, consistent roleplaying, scenario generation for VSP use, and educationally structured feedback. We introduce an agentic framework for training general practitioner student skills that unifies (i) configurable, evidence-based vignette generation, (ii) controlled persona-driven patient dialogue with optional retrieval grounding, and (iii) standards-based assessment and feedback for both communication and clinical reasoning. We instantiate the framework in an interactive spoken consultation setting and evaluate it with medical students ($\mathbf{N{=}14}$). Participants reported realistic and vignette-faithful dialogue, appropriate difficulty calibration, a stable personality signal, and highly useful example-rich feedback, alongside excellent overall usability. These results support agentic separation of scenario control, interaction control, and standards-based assessment as a practical pattern for building dependable and pedagogically valuable VSP training tools.

Via

Access Paper or Ask Questions

WinoWhat: A Parallel Corpus of Paraphrased WinoGrande Sentences with Common Sense Categorization

Mar 31, 2025

Ine Gevers, Victor De Marez, Luna De Bruyne, Walter Daelemans

Abstract:In this study, we take a closer look at how Winograd schema challenges can be used to evaluate common sense reasoning in LLMs. Specifically, we evaluate generative models of different sizes on the popular WinoGrande benchmark. We release WinoWhat, a new corpus, in which each instance of the WinoGrande validation set is paraphrased. Additionally, we evaluate the performance on the challenge across five common sense knowledge categories, giving more fine-grained insights on what types of knowledge are more challenging for LLMs. Surprisingly, all models perform significantly worse on WinoWhat, implying that LLM reasoning capabilities are overestimated on WinoGrande. To verify whether this is an effect of benchmark memorization, we match benchmark instances to LLM trainingdata and create two test-suites. We observe that memorization has a minimal effect on model performance on WinoGrande.

Via

Access Paper or Ask Questions

THInC: A Theory-Driven Framework for Computational Humor Detection

Sep 02, 2024

Victor De Marez, Thomas Winters, Ayla Rigouts Terryn

Figure 1 for THInC: A Theory-Driven Framework for Computational Humor Detection

Figure 2 for THInC: A Theory-Driven Framework for Computational Humor Detection

Figure 3 for THInC: A Theory-Driven Framework for Computational Humor Detection

Figure 4 for THInC: A Theory-Driven Framework for Computational Humor Detection

Abstract:Humor is a fundamental aspect of human communication and cognition, as it plays a crucial role in social engagement. Although theories about humor have evolved over centuries, there is still no agreement on a single, comprehensive humor theory. Likewise, computationally recognizing humor remains a significant challenge despite recent advances in large language models. Moreover, most computational approaches to detecting humor are not based on existing humor theories. This paper contributes to bridging this long-standing gap between humor theory research and computational humor detection by creating an interpretable framework for humor classification, grounded in multiple humor theories, called THInC (Theory-driven Humor Interpretation and Classification). THInC ensembles interpretable GA2M classifiers, each representing a different humor theory. We engineered a transparent flow to actively create proxy features that quantitatively reflect different aspects of theories. An implementation of this framework achieves an F1 score of 0.85. The associative interpretability of the framework enables analysis of proxy efficacy, alignment of joke features with theories, and identification of globally contributing features. This paper marks a pioneering effort in creating a humor detection framework that is informed by diverse humor theories and offers a foundation for future advancements in theory-driven humor classification. It also serves as a first step in automatically comparing humor theories in a quantitative manner.

* Accepted at CREAI 2024 (International Workshop on Artificial Intelligence and Creativity)

Via

Access Paper or Ask Questions