Alert button

"speech recognition": models, code, and papers
Alert button

MobiVSR: A Visual Speech Recognition Solution for Mobile Devices

Jun 05, 2019
Nilay Shrivastava, Astitwa Saxena, Yaman Kumar, Rajiv Ratn Shah, Debanjan Mahata, Amanda Stent

Figure 1 for MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
Figure 2 for MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
Figure 3 for MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
Figure 4 for MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
Viaarxiv icon

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

Add code
Bookmark button
Alert button
Jul 29, 2022
Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg

Figure 1 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Figure 2 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Figure 3 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Figure 4 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Viaarxiv icon

A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition

May 25, 2020
Linhao Dong, Cheng Yi, Jianzong Wang, Shiyu Zhou, Shuang Xu, Xueli Jia, Bo Xu

Figure 1 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Figure 2 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Figure 3 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Figure 4 for A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Viaarxiv icon

Streaming End-to-end Speech Recognition For Mobile Devices

Add code
Bookmark button
Alert button
Nov 15, 2018
Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-yiin Chang, Kanishka Rao, Alexander Gruenstein

Figure 1 for Streaming End-to-end Speech Recognition For Mobile Devices
Figure 2 for Streaming End-to-end Speech Recognition For Mobile Devices
Figure 3 for Streaming End-to-end Speech Recognition For Mobile Devices
Figure 4 for Streaming End-to-end Speech Recognition For Mobile Devices
Viaarxiv icon

CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition

Mar 31, 2022
Chengxin Chen, Pengyuan Zhang

Figure 1 for CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Figure 2 for CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Figure 3 for CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Figure 4 for CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Viaarxiv icon

Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations

Add code
Bookmark button
Alert button
Jun 02, 2022
Chang Liu, Zhen-Hua Ling, Ling-Hui Chen

Figure 1 for Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations
Figure 2 for Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations
Figure 3 for Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations
Figure 4 for Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations
Viaarxiv icon

FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation

Feb 11, 2022
Yuantian Miao, Chao Chen, Lei Pan, Jun Zhang, Yang Xiang

Figure 1 for FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation
Figure 2 for FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation
Figure 3 for FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation
Figure 4 for FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation
Viaarxiv icon

You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

Add code
Bookmark button
Alert button
May 14, 2020
Aleksandr Laptev, Roman Korostik, Aleksey Svischev, Andrei Andrusenko, Ivan Medennikov, Sergey Rybin

Figure 1 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Figure 2 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Figure 3 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Figure 4 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Viaarxiv icon