Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jaesung Bae

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Mar 16, 2026

Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson

Abstract:Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.

* Submitted to Interspeech 2026

Via

Access Paper or Ask Questions

That's Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation

Oct 21, 2025

Jaesung Bae, Cameron Churchwell, Mitchell Hermon, Tsun-An Hsieh, Jocelyn Xu, Yekaterina Yegorova, Mark Hasegawa-Johnson, Heng Ji

Figure 1 for That's Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation

Figure 2 for That's Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation

Figure 3 for That's Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation

Figure 4 for That's Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation

Abstract:This paper investigates how large language models (LLMs) behave when faced with discrepancies between their parametric knowledge and conflicting information contained in a prompt. Building on prior question-answering (QA) research, we extend the investigation of knowledge conflicts to the realm of code generation. We propose a domain-agnostic framework for constructing and interpreting such conflicts, along with a novel evaluation method and dataset tailored to code conflict scenarios. Our experiments indicate that sufficiently large LLMs encode the notion of a knowledge conflict in their parameters, enabling us to detect knowledge conflicts with up to \textbf{80.65\%} accuracy. Building on these insights, we show that activation-level steering can achieve up to a \textbf{12.6\%} improvement in steering success over a random baseline. However, effectiveness depends critically on balancing model size, task domain, and steering direction. The experiment code and data will be made publicly available after acceptance.

Via

Access Paper or Ask Questions