Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gottfried Zimmermann

Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition

Mar 19, 2025

Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann

Abstract:Communication access real-time translation (CART) is an essential accessibility service for d/Deaf and hard of hearing (DHH) individuals, but the cost and scarcity of trained personnel limit its availability. While Automatic Speech Recognition (ASR) offers a cheap and scalable alternative, transcription errors can lead to serious accessibility issues. Real-time correction of ASR by non-professionals presents an under-explored CART workflow that addresses these limitations. We conducted a user study with 75 participants to evaluate the feasibility and efficiency of this workflow. Complementary, we held focus groups with 25 DHH individuals to identify acceptable accuracy levels and factors affecting the accessibility of real-time captioning. Results suggest that collaborative editing can improve transcription accuracy to the extent that DHH users rate it positively regarding understandability. Focus groups also showed that human effort to improve captioning is highly valued, supporting a semi-automated approach as an alternative to stand-alone ASR and traditional CART services.

* 8 pages, 2 figures, to be published in Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25)

Via

Access Paper or Ask Questions

Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces

Mar 19, 2025

Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann

Abstract:Despite advances in Automatic Speech Recognition (ASR), transcription errors persist and require manual correction. Confidence scores, which indicate the certainty of ASR results, could assist users in identifying and correcting errors. This study evaluates the reliability of confidence scores for error detection through a comprehensive analysis of end-to-end ASR models and a user study with 36 participants. The results show that while confidence scores correlate with transcription accuracy, their error detection performance is limited. Classifiers frequently miss errors or generate many false positives, undermining their practical utility. Confidence-based error detection neither improved correction efficiency nor was perceived as helpful by participants. These findings highlight the limitations of confidence scores and the need for more sophisticated approaches to improve user interaction and explainability of ASR results.

* 7 pages, 1 figure, to be published in Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25)

Via

Access Paper or Ask Questions

Measuring the Accuracy of Automatic Speech Recognition Solutions

Aug 29, 2024

Korbinian Kuhn, Verena Kersken, Benedikt Reuter, Niklas Egger, Gottfried Zimmermann

Abstract:For d/Deaf and hard of hearing (DHH) people, captioning is an essential accessibility tool. Significant developments in artificial intelligence (AI) mean that Automatic Speech Recognition (ASR) is now a part of many popular applications. This makes creating captions easy and broadly available - but transcription needs high levels of accuracy to be accessible. Scientific publications and industry report very low error rates, claiming AI has reached human parity or even outperforms manual transcription. At the same time the DHH community reports serious issues with the accuracy and reliability of ASR. There seems to be a mismatch between technical innovations and the real-life experience for people who depend on transcription. Independent and comprehensive data is needed to capture the state of ASR. We measured the performance of eleven common ASR services with recordings of Higher Education lectures. We evaluated the influence of technical conditions like streaming, the use of vocabularies, and differences between languages. Our results show that accuracy ranges widely between vendors and for the individual audio samples. We also measured a significant lower quality for streaming ASR, which is used for live events. Our study shows that despite the recent improvements of ASR, common services lack reliability in accuracy.

* ACM Transactions on Accessible Computing, Volume 16, Issue 4, Article 25 (2023), 1-23

Via

Access Paper or Ask Questions

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

Aug 28, 2024

Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann

Abstract:The Word Error Rate (WER) is the common measure of accuracy for Automatic Speech Recognition (ASR). Transcripts are usually pre-processed by substituting specific characters to account for non-semantic differences. As a result of this normalisation, information on the accuracy of punctuation or capitalisation is lost. We present a non-destructive, token-based approach using an extended Levenshtein distance algorithm to compute a robust WER and additional orthographic metrics. Transcription errors are also classified more granularly by existing string similarity and phonetic algorithms. An evaluation on several datasets demonstrates the practical equivalence of our approach compared to common WER computations. We also provide an exemplary analysis of derived use cases, such as a punctuation error rate, and a web application for interactive use and visualisation of our implementation. The code is available open-source.

* Accepted in INTERSPEECH 2024

Via

Access Paper or Ask Questions