Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Griffin Dietz Smith

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

May 29, 2025

Griffin Dietz Smith, Dianna Yee, Jennifer King Chen, Leah Findlater

Figure 1 for Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

Figure 2 for Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

Figure 3 for Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

Figure 4 for Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

Abstract:Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that incorporating reading text through prompting benefits verbatim transcription performance over fine-tuning, and second, showing that it is feasible to augment speech recognition tasks for end-to-end miscue detection. We conducted two case studies -- children's read-aloud and adult atypical speech -- and found that our proposed strategies improve verbatim transcription and miscue detection compared to current state-of-the-art.

* Interspeech 2025

Via

Access Paper or Ask Questions