Alert button

"speech recognition": models, code, and papers
Alert button

Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

Oct 29, 2019
Thai-Son Nguyen, Sebastian Stueker, Jan Niehues, Alex Waibel

Figure 1 for Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
Figure 2 for Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
Figure 3 for Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
Figure 4 for Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
Viaarxiv icon

Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading

Mar 17, 2017
Chunlin Tian, Weijun Ji

Figure 1 for Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading
Figure 2 for Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading
Figure 3 for Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading
Figure 4 for Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading
Viaarxiv icon

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

Apr 20, 2022
Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman

Figure 1 for A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
Figure 2 for A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
Figure 3 for A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
Figure 4 for A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
Viaarxiv icon

End-to-end multi-talker audio-visual ASR using an active speaker attention module

Apr 01, 2022
Richard Rose, Olivier Siohan

Figure 1 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Figure 2 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Figure 3 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Figure 4 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Viaarxiv icon

Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel

Aug 19, 2021
Jin Li, Nan Yan, Lan Wang

Figure 1 for Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel
Figure 2 for Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel
Figure 3 for Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel
Figure 4 for Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel
Viaarxiv icon

Joint Encoder-Decoder Self-Supervised Pre-training for ASR

Jun 09, 2022
Arunkumar A, Umesh S

Figure 1 for Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Figure 2 for Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Figure 3 for Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Figure 4 for Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Viaarxiv icon

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

Mar 22, 2019
Yao Qin, Nicholas Carlini, Ian Goodfellow, Garrison Cottrell, Colin Raffel

Figure 1 for Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
Figure 2 for Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
Figure 3 for Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
Viaarxiv icon

CMGAN: Conformer-based Metric GAN for Speech Enhancement

Mar 28, 2022
Ruizhe Cao, Sherif Abdulatif, Bin Yang

Figure 1 for CMGAN: Conformer-based Metric GAN for Speech Enhancement
Figure 2 for CMGAN: Conformer-based Metric GAN for Speech Enhancement
Figure 3 for CMGAN: Conformer-based Metric GAN for Speech Enhancement
Viaarxiv icon

"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations

Sep 28, 2021
Seokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan, Behnam Hedayatnia, Dilek Hakkani-Tur

Figure 1 for "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations
Figure 2 for "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations
Figure 3 for "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations
Figure 4 for "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations
Viaarxiv icon

Feature Normalisation for Robust Speech Recognition

Jul 14, 2015
D. S. Pavan Kumar

Figure 1 for Feature Normalisation for Robust Speech Recognition
Figure 2 for Feature Normalisation for Robust Speech Recognition
Figure 3 for Feature Normalisation for Robust Speech Recognition
Figure 4 for Feature Normalisation for Robust Speech Recognition
Viaarxiv icon