Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cor Steging

Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

May 08, 2025

Elena Musi, Nadin Kokciyan, Khalid Al-Khatib, Davide Ceolin, Emmanuelle Dietz, Klara Gutekunst, Annette Hautli-Janisz, Cristian Manuel Santibañez Yañez, Jodi Schneider, Jonas Scholz(+3 more)

Figure 1 for Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

Figure 2 for Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

Abstract:In this position paper, we advocate for the development of conversational technology that is inherently designed to support and facilitate argumentative processes. We argue that, at present, large language models (LLMs) are inadequate for this purpose, and we propose an ideal technology design aimed at enhancing argumentative skills. This involves re-framing LLMs as tools to exercise our critical thinking rather than replacing them. We introduce the concept of 'reasonable parrots' that embody the fundamental principles of relevance, responsibility, and freedom, and that interact through argumentative dialogical moves. These principles and moves arise out of millennia of work in argumentation theory and should serve as the starting point for LLM-based technology that incorporates basic principles of argumentation.

Via

Access Paper or Ask Questions

Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language Models

May 02, 2025

Cor Steging, Silja Renooij, Bart Verheij

Abstract:Generative large language models as tools in the legal domain have the potential to improve the justice system. However, the reasoning behavior of current generative models is brittle and poorly understood, hence cannot be responsibly applied in the domains of law and evidence. In this paper, we introduce an approach for creating benchmarks that can be used to evaluate the reasoning capabilities of generative language models. These benchmarks are dynamically varied, scalable in their complexity, and have formally unambiguous interpretations. In this study, we illustrate the approach on the basis of witness testimony, focusing on the underlying argument attack structure. We dynamically generate both linear and non-linear argument attack graphs of varying complexity and translate these into reasoning puzzles about witness testimony expressed in natural language. We show that state-of-the-art large language models often fail in these reasoning puzzles, already at low complexity. Obvious mistakes are made by the models, and their inconsistent performance indicates that their reasoning capabilities are brittle. Furthermore, at higher complexity, even state-of-the-art models specifically presented for reasoning capabilities make mistakes. We show the viability of using a parametrized benchmark with varying complexity to evaluate the reasoning capabilities of generative language models. As such, the findings contribute to a better understanding of the limitations of the reasoning capabilities of generative models, which is essential when designing responsible AI systems in the legal domain.

* This manuscript has been accepted for presentation as a short paper at the 20th International Conference of AI & Law in Chicago, June 16 to 20 of 2025

Via

Access Paper or Ask Questions

Discovering the Rationale of Decisions: Experiments on Aligning Learning and Reasoning

May 14, 2021

Cor Steging, Silja Renooij, Bart Verheij

Figure 1 for Discovering the Rationale of Decisions: Experiments on Aligning Learning and Reasoning

Figure 2 for Discovering the Rationale of Decisions: Experiments on Aligning Learning and Reasoning

Figure 3 for Discovering the Rationale of Decisions: Experiments on Aligning Learning and Reasoning

Figure 4 for Discovering the Rationale of Decisions: Experiments on Aligning Learning and Reasoning

Abstract:In AI and law, systems that are designed for decision support should be explainable when pursuing justice. In order for these systems to be fair and responsible, they should make correct decisions and make them using a sound and transparent rationale. In this paper, we introduce a knowledge-driven method for model-agnostic rationale evaluation using dedicated test cases, similar to unit-testing in professional software development. We apply this new method in a set of machine learning experiments aimed at extracting known knowledge structures from artificial datasets from fictional and non-fictional legal settings. We show that our method allows us to analyze the rationale of black-box machine learning systems by assessing which rationale elements are learned or not. Furthermore, we show that the rationale can be adjusted using tailor-made training data based on the results of the rationale evaluation.

* 21 pages

Via

Access Paper or Ask Questions