Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thomas Bohné

ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation

Jul 22, 2025

Roman Mayr, Michel Schimpf, Thomas Bohné

Abstract:While modern dialogue systems heavily rely on large language models (LLMs), their implementation often goes beyond pure LLM interaction. Developers integrate multiple LLMs, external tools, and databases. Therefore, assessment of the underlying LLM alone does not suffice, and the dialogue systems must be tested and evaluated as a whole. However, this remains a major challenge. With most previous work focusing on turn-level analysis, less attention has been paid to integrated dialogue-level quality assurance. To address this, we present ChatChecker, a framework for automated evaluation and testing of complex dialogue systems. ChatChecker uses LLMs to simulate diverse user interactions, identify dialogue breakdowns, and evaluate quality. Compared to previous approaches, our design reduces setup effort and is generalizable, as it does not require reference dialogues and is decoupled from the implementation of the target dialogue system. We improve breakdown detection performance over a prior LLM-based approach by including an error taxonomy in the prompt. Additionally, we propose a novel non-cooperative user simulator based on challenging personas that uncovers weaknesses in target dialogue systems more effectively. Through this, ChatChecker contributes to thorough and scalable testing. This enables both researchers and practitioners to accelerate the development of robust dialogue systems.

Via

Access Paper or Ask Questions

Stakeholder Participation for Responsible AI Development: Disconnects Between Guidance and Current Practice

Jun 11, 2025

Emma Kallina, Thomas Bohné, Jat Singh

Abstract:Responsible AI (rAI) guidance increasingly promotes stakeholder involvement (SHI) during AI development. At the same time, SHI is already common in commercial software development, but with potentially different foci. This study clarifies the extent to which established SHI practices are able to contribute to rAI efforts as well as potential disconnects -- essential insights to inform and tailor future interventions that further shift industry practice towards rAI efforts. First, we analysed 56 rAI guidance documents to identify why SHI is recommended (i.e. its expected benefits for rAI) and uncovered goals such as redistributing power, improving socio-technical understandings, anticipating risks, and enhancing public oversight. To understand why and how SHI is currently practised in commercial settings, we then conducted an online survey (n=130) and semi-structured interviews (n=10) with AI practitioners. Our findings reveal that SHI in practice is primarily driven by commercial priorities (e.g. customer value, compliance) and several factors currently discourage more rAI-aligned SHI practices. This suggests that established SHI practices are largely not contributing to rAI efforts. To address this disconnect, we propose interventions and research opportunities to advance rAI development in practice.

* Published at the 2025 ACM Conference on Fairness, Accountability, and Transparency FAccT'25

Via

Access Paper or Ask Questions

Multi-Objective Causal Bayesian Optimization

Feb 20, 2025

Shriya Bhatija, Paul-David Zuercher, Jakob Thumm, Thomas Bohné

Abstract:In decision-making problems, the outcome of an intervention often depends on the causal relationships between system components and is highly costly to evaluate. In such settings, causal Bayesian optimization (CBO) can exploit the causal relationships between the system variables and sequentially perform interventions to approach the optimum with minimal data. Extending CBO to the multi-outcome setting, we propose Multi-Objective Causal Bayesian Optimization (MO-CBO), a paradigm for identifying Pareto-optimal interventions within a known multi-target causal graph. We first derive a graphical characterization for potentially optimal sets of variables to intervene upon. Showing that any MO-CBO problem can be decomposed into several traditional multi-objective optimization tasks, we then introduce an algorithm that sequentially balances exploration across these tasks using relative hypervolume improvement. The proposed method will be validated on both synthetic and real-world causal graphs, demonstrating its superiority over traditional (non-causal) multi-objective Bayesian optimization in settings where causal information is available.

* 17 Pages, 12 Figures

Via

Access Paper or Ask Questions

MiraGe: Editable 2D Images using Gaussian Splatting

Oct 02, 2024

Joanna Waczyńska, Tomasz Szczepanik, Piotr Borycki, Sławomir Tadeja, Thomas Bohné, Przemysław Spurek

Abstract:Implicit Neural Representations (INRs) approximate discrete data through continuous functions and are commonly used for encoding 2D images. Traditional image-based INRs employ neural networks to map pixel coordinates to RGB values, capturing shapes, colors, and textures within the network's weights. Recently, GaussianImage has been proposed as an alternative, using Gaussian functions instead of neural networks to achieve comparable quality and compression. Such a solution obtains a quality and compression ratio similar to classical INR models but does not allow image modification. In contrast, our work introduces a novel method, MiraGe, which uses mirror reflections to perceive 2D images in 3D space and employs flat-controlled Gaussians for precise 2D image editing. Our approach improves the rendering quality and allows realistic image modifications, including human-inspired perception of photos in the 3D world. Thanks to modeling images in 3D space, we obtain the illusion of 3D-based modification in 2D images. We also show that our Gaussian representation can be easily combined with a physics engine to produce physics-based modification of 2D images. Consequently, MiraGe allows for better quality than the standard approach and natural modification of 2D images.

Via

Access Paper or Ask Questions