Alert button

"speech recognition": models, code, and papers
Alert button

calamanCy: A Tagalog Natural Language Processing Toolkit

Add code
Bookmark button
Alert button
Nov 13, 2023
Lester James V. Miranda

Figure 1 for calamanCy: A Tagalog Natural Language Processing Toolkit
Figure 2 for calamanCy: A Tagalog Natural Language Processing Toolkit
Figure 3 for calamanCy: A Tagalog Natural Language Processing Toolkit
Figure 4 for calamanCy: A Tagalog Natural Language Processing Toolkit
Viaarxiv icon

Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test

May 22, 2023
Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee

Figure 1 for Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test
Figure 2 for Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test
Figure 3 for Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test
Viaarxiv icon

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model

Oct 06, 2023
Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe

Figure 1 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 2 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 3 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 4 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Viaarxiv icon

Human Transcription Quality Improvement

Add code
Bookmark button
Alert button
Sep 24, 2023
Jian Gao, Hanbo Sun, Cheng Cao, Zheng Du

Figure 1 for Human Transcription Quality Improvement
Figure 2 for Human Transcription Quality Improvement
Figure 3 for Human Transcription Quality Improvement
Figure 4 for Human Transcription Quality Improvement
Viaarxiv icon

Distilling HuBERT with LSTMs via Decoupled Knowledge Distillation

Add code
Bookmark button
Alert button
Sep 18, 2023
Danilo de Oliveira, Timo Gerkmann

Viaarxiv icon

Memory-augmented conformer for improved end-to-end long-form ASR

Add code
Bookmark button
Alert button
Sep 22, 2023
Carlos Carvalho, Alberto Abad

Figure 1 for Memory-augmented conformer for improved end-to-end long-form ASR
Figure 2 for Memory-augmented conformer for improved end-to-end long-form ASR
Figure 3 for Memory-augmented conformer for improved end-to-end long-form ASR
Viaarxiv icon

Enhancing Unsupervised Speech Recognition with Diffusion GANs

Add code
Bookmark button
Alert button
Mar 23, 2023
Xianchao Wu

Figure 1 for Enhancing Unsupervised Speech Recognition with Diffusion GANs
Figure 2 for Enhancing Unsupervised Speech Recognition with Diffusion GANs
Figure 3 for Enhancing Unsupervised Speech Recognition with Diffusion GANs
Viaarxiv icon

MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition

Add code
Bookmark button
Alert button
Jun 18, 2023
Yuchen Hu, Chen Chen, Ruizhe Li, Heqing Zou, Eng Siong Chng

Figure 1 for MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition
Figure 2 for MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition
Figure 3 for MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition
Figure 4 for MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition
Viaarxiv icon

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Add code
Bookmark button
Alert button
Mar 03, 2023
Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu

Figure 1 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 2 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 3 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 4 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Viaarxiv icon

AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR

Sep 30, 2023
Tobi Olatunji, Tejumade Afonja, Aditya Yadavalli, Chris Chinenye Emezue, Sahib Singh, Bonaventure F. P. Dossou, Joanne Osuchukwu, Salomey Osei, Atnafu Lambebo Tonja, Naome Etori, Clinton Mbataku

Figure 1 for AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Figure 2 for AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Figure 3 for AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Figure 4 for AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Viaarxiv icon