Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Müller

On the Reliability of User-Centric Evaluation of Conversational Recommender Systems

Feb 19, 2026

Michael Müller, Amir Reza Mohammadi, Andreas Peintner, Beatriz Barroso Gstrein, Günther Specht, Eva Zangerle

Abstract:User-centric evaluation has become a key paradigm for assessing Conversational Recommender Systems (CRS), aiming to capture subjective qualities such as satisfaction, trust, and rapport. To enable scalable evaluation, recent work increasingly relies on third-party annotations of static dialogue logs by crowd workers or large language models. However, the reliability of this practice remains largely unexamined. In this paper, we present a large-scale empirical study investigating the reliability and structure of user-centric CRS evaluation on static dialogue transcripts. We collected 1,053 annotations from 124 crowd workers on 200 ReDial dialogues using the 18-dimensional CRS-Que framework. Using random-effects reliability models and correlation analysis, we quantify the stability of individual dimensions and their interdependencies. Our results show that utilitarian and outcome-oriented dimensions such as accuracy, usefulness, and satisfaction achieve moderate reliability under aggregation, whereas socially grounded constructs such as humanness and rapport are substantially less reliable. Furthermore, many dimensions collapse into a single global quality signal, revealing a strong halo effect in third-party judgments. These findings challenge the validity of single-annotator and LLM-based evaluation protocols and motivate the need for multi-rater aggregation and dimension reduction in offline CRS evaluation.

* 5 pages, 2 figures. Submitted to UMAP 2026. Code available at https://github.com/michael-mue/reliable-crs-eval

Via

Access Paper or Ask Questions

DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids

Jan 17, 2025

Iosif Tsangko, Andreas Triantafyllopoulos, Michael Müller, Hendrik Schröter, Björn W. Schuller

Figure 1 for DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids

Figure 2 for DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids

Figure 3 for DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids

Figure 4 for DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids

Abstract:The \textbf{DeepFilterNet} (\textbf{DFN}) architecture was recently proposed as a deep learning model suited for hearing aid devices. Despite its competitive performance on numerous benchmarks, it still follows a `one-size-fits-all' approach, which aims to train a single, monolithic architecture that generalises across different noises and environments. However, its limited size and computation budget can hamper its generalisability. Recent work has shown that in-context adaptation can improve performance by conditioning the denoising process on additional information extracted from background recordings to mitigate this. These recordings can be offloaded outside the hearing aid, thus improving performance while adding minimal computational overhead. We introduce these principles to the \textbf{DFN} model, thus proposing the \textbf{DFingerNet} (\textbf{DFiN}) model, which shows superior performance on various benchmarks inspired by the DNS Challenge.

* Comments: Accepted at ICASSP 2025. 5 pages, 3 figures

Via

Access Paper or Ask Questions