Alert button

"speech recognition": models, code, and papers
Alert button

Position Prediction as an Effective Pretraining Strategy

Add code
Bookmark button
Alert button
Jul 15, 2022
Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind

Figure 1 for Position Prediction as an Effective Pretraining Strategy
Figure 2 for Position Prediction as an Effective Pretraining Strategy
Figure 3 for Position Prediction as an Effective Pretraining Strategy
Figure 4 for Position Prediction as an Effective Pretraining Strategy
Viaarxiv icon

An online sequence-to-sequence model for noisy speech recognition

Jun 16, 2017
Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly

Figure 1 for An online sequence-to-sequence model for noisy speech recognition
Figure 2 for An online sequence-to-sequence model for noisy speech recognition
Figure 3 for An online sequence-to-sequence model for noisy speech recognition
Figure 4 for An online sequence-to-sequence model for noisy speech recognition
Viaarxiv icon

Contrastive Unsupervised Learning for Speech Emotion Recognition

Feb 12, 2021
Mao Li, Bo Yang, Joshua Levy, Andreas Stolcke, Viktor Rozgic, Spyros Matsoukas, Constantinos Papayiannis, Daniel Bone, Chao Wang

Figure 1 for Contrastive Unsupervised Learning for Speech Emotion Recognition
Figure 2 for Contrastive Unsupervised Learning for Speech Emotion Recognition
Viaarxiv icon

Scaling ASR Improves Zero and Few Shot Learning

Nov 29, 2021
Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

Figure 1 for Scaling ASR Improves Zero and Few Shot Learning
Figure 2 for Scaling ASR Improves Zero and Few Shot Learning
Figure 3 for Scaling ASR Improves Zero and Few Shot Learning
Figure 4 for Scaling ASR Improves Zero and Few Shot Learning
Viaarxiv icon

Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel

Add code
Bookmark button
Alert button
Aug 19, 2021
Jin Li, Nan Yan, Lan Wang

Figure 1 for Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel
Figure 2 for Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel
Figure 3 for Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel
Figure 4 for Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel
Viaarxiv icon

Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech

Add code
Bookmark button
Alert button
Mar 31, 2022
Samik Sadhu, Hynek Hermansky

Figure 1 for Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech
Figure 2 for Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech
Figure 3 for Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech
Figure 4 for Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech
Viaarxiv icon

Context-based out-of-vocabulary word recovery for ASR systems in Indian languages

Jun 09, 2022
Arun Baby, Saranya Vinnaitherthan, Akhil Kerhalkar, Pranav Jawale, Sharath Adavanne, Nagaraj Adiga

Figure 1 for Context-based out-of-vocabulary word recovery for ASR systems in Indian languages
Figure 2 for Context-based out-of-vocabulary word recovery for ASR systems in Indian languages
Figure 3 for Context-based out-of-vocabulary word recovery for ASR systems in Indian languages
Figure 4 for Context-based out-of-vocabulary word recovery for ASR systems in Indian languages
Viaarxiv icon

Memory-Efficient Training of RNN-Transducer with Sampled Softmax

Mar 31, 2022
Jaesong Lee, Lukas Lee, Shinji Watanabe

Figure 1 for Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Figure 2 for Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Figure 3 for Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Viaarxiv icon

Small energy masking for improved neural network training for end-to-end speech recognition

Feb 15, 2020
Chanwoo Kim, Kwangyoun Kim, Sathish Reddy Indurthi

Figure 1 for Small energy masking for improved neural network training for end-to-end speech recognition
Figure 2 for Small energy masking for improved neural network training for end-to-end speech recognition
Figure 3 for Small energy masking for improved neural network training for end-to-end speech recognition
Figure 4 for Small energy masking for improved neural network training for end-to-end speech recognition
Viaarxiv icon

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Jul 03, 2022
Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma

Figure 1 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Figure 2 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Figure 3 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Figure 4 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Viaarxiv icon