Alert button

"speech": models, code, and papers
Alert button

Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition

May 14, 2021
Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas

Figure 1 for Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition
Figure 2 for Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition
Figure 3 for Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition
Figure 4 for Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition
Viaarxiv icon

Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition

Sep 14, 2021
Chuan-Fei Zhang, Yan Liu, Tian-Hao Zhang, Song-Lu Chen, Feng Chen, Xu-Cheng Yin

Figure 1 for Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition
Figure 2 for Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition
Figure 3 for Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition
Figure 4 for Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition
Viaarxiv icon

VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech

Add code
Bookmark button
Alert button
Nov 03, 2020
Kun Zhou, Berrak Sisman, Haizhou Li

Figure 1 for VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech
Figure 2 for VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech
Figure 3 for VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech
Figure 4 for VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech
Viaarxiv icon

Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

Feb 26, 2022
Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang J. Kuo

Figure 1 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 2 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 3 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 4 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Viaarxiv icon

Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition

Sep 10, 2021
Rong Gong, Carl Quillen, Dushyant Sharma, Andrew Goderre, José Laínez, Ljubomir Milanović

Figure 1 for Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Figure 2 for Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Figure 3 for Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Figure 4 for Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Viaarxiv icon

Finnish Parliament ASR corpus - Analysis, benchmarks and statistics

Add code
Bookmark button
Alert button
Mar 28, 2022
Anja Virkkunen, Aku Rouhe, Nhan Phan, Mikko Kurimo

Figure 1 for Finnish Parliament ASR corpus - Analysis, benchmarks and statistics
Figure 2 for Finnish Parliament ASR corpus - Analysis, benchmarks and statistics
Figure 3 for Finnish Parliament ASR corpus - Analysis, benchmarks and statistics
Figure 4 for Finnish Parliament ASR corpus - Analysis, benchmarks and statistics
Viaarxiv icon

Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries

May 20, 2021
Sukhdeep S. Sodhi, Ellie Ka-In Chio, Ambarish Jash, Santiago Ontañón, Ajit Apte, Ankit Kumar, Ayooluwakunmi Jeje, Dima Kuzmin, Harry Fung, Heng-Tze Cheng, Jon Effrat, Tarush Bali, Nitin Jindal, Pei Cao, Sarvjeet Singh, Senqiang Zhou, Tameen Khan, Amol Wankhede, Moustafa Alzantot, Allen Wu, Tushar Chandra

Figure 1 for Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries
Figure 2 for Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries
Figure 3 for Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries
Figure 4 for Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries
Viaarxiv icon

You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

Add code
Bookmark button
Alert button
May 14, 2020
Aleksandr Laptev, Roman Korostik, Aleksey Svischev, Andrei Andrusenko, Ivan Medennikov, Sergey Rybin

Figure 1 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Figure 2 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Figure 3 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Figure 4 for You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Viaarxiv icon

Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition

Oct 23, 2020
Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman

Figure 1 for Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition
Figure 2 for Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition
Figure 3 for Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition
Figure 4 for Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition
Viaarxiv icon

Dialogue Enhancement and Listening Effort in Broadcast Audio: A Multimodal Evaluation

Aug 03, 2022
Matteo Torcoli, Thomas Robotham, Emanuël A. P. Habets

Figure 1 for Dialogue Enhancement and Listening Effort in Broadcast Audio: A Multimodal Evaluation
Figure 2 for Dialogue Enhancement and Listening Effort in Broadcast Audio: A Multimodal Evaluation
Figure 3 for Dialogue Enhancement and Listening Effort in Broadcast Audio: A Multimodal Evaluation
Viaarxiv icon