Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiwoon Lee

DASH: Dual-View Self-Distillation with Multi-Layer Hidden Representations for Robust Speech Recognition

Jun 17, 2026

Jaeeun Baik, Ui-Hyeop Shin, Jiwoon Lee, Woocheol Jeong, Hyung-Min Park

Abstract:Automatic Speech Recognition (ASR) often degrades in real-world noisy environments, making noise robustness essential for deployment. Supervised noise-augmented fine-tuning is a common remedy, but it can introduce a robustness-clean trade-off and overfit to specific corruptions, degrading recognition in clean conditions. We propose DASH, a self-distillation framework that improves robustness by learning clean--noisy consistency from paired views. DASH distills hidden representations from multiple encoder layers to capture features from low-level acoustics to high-level semantics, and stabilizes training by minimizing KL divergence between prototype assignment distributions of clean and noisy views. Experiments on LibriSpeech show that DASH consistently improves recognition under diverse noisy conditions while preserving clean accuracy, achieved by a label-free pre-training stage with minimal additional overhead (about 4% of fine-tuning time) beyond standard fine-tuning.

* Accepted to Interspeech 2026

Via

Access Paper or Ask Questions

Debiased Distillation by Transplanting the Last Layer

Feb 22, 2023

Jiwoon Lee, Jaeho Lee

Figure 1 for Debiased Distillation by Transplanting the Last Layer

Figure 2 for Debiased Distillation by Transplanting the Last Layer

Figure 3 for Debiased Distillation by Transplanting the Last Layer

Figure 4 for Debiased Distillation by Transplanting the Last Layer

Abstract:Deep models are susceptible to learning spurious correlations, even during the post-processing. We take a closer look at the knowledge distillation -- a popular post-processing technique for model compression -- and find that distilling with biased training data gives rise to a biased student, even when the teacher is debiased. To address this issue, we propose a simple knowledge distillation algorithm, coined DeTT (Debiasing by Teacher Transplanting). Inspired by a recent observation that the last neural net layer plays an overwhelmingly important role in debiasing, DeTT directly transplants the teacher's last layer to the student. Remaining layers are distilled by matching the feature map outputs of the student and the teacher, where the samples are reweighted to mitigate the dataset bias. Importantly, DeTT does not rely on the availability of extensive annotations on the bias-related attribute, which is typically not available during the post-processing phase. Throughout our experiments, DeTT successfully debiases the student model, consistently outperforming the baselines in terms of the worst-group accuracy.

Via

Access Paper or Ask Questions