Alert button

"speech recognition": models, code, and papers
Alert button

An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition

Dec 21, 2018
Devesh Walawalkar, Yihui He, Rohit Pillai

Figure 1 for An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition
Figure 2 for An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition
Figure 3 for An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition
Figure 4 for An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition
Viaarxiv icon

FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

Add code
Bookmark button
Alert button
Oct 01, 2021
Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Linquan Liu, Tao Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu

Figure 1 for FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Figure 2 for FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Figure 3 for FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Figure 4 for FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Viaarxiv icon

Exploring RNN-Transducer for Chinese Speech Recognition

Nov 13, 2018
Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie

Figure 1 for Exploring RNN-Transducer for Chinese Speech Recognition
Figure 2 for Exploring RNN-Transducer for Chinese Speech Recognition
Figure 3 for Exploring RNN-Transducer for Chinese Speech Recognition
Figure 4 for Exploring RNN-Transducer for Chinese Speech Recognition
Viaarxiv icon

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion

Jun 27, 2019
Suyoun Kim, Siddharth Dalmia, Florian Metze

Figure 1 for Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Figure 2 for Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Figure 3 for Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Figure 4 for Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Viaarxiv icon

Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis

Nov 04, 2020
Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Figure 1 for Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis
Figure 2 for Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis
Figure 3 for Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis
Figure 4 for Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis
Viaarxiv icon

SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

Jun 27, 2022
Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao

Figure 1 for SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Figure 2 for SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Figure 3 for SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Figure 4 for SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Viaarxiv icon

Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

Add code
Bookmark button
Alert button
May 02, 2022
Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi

Figure 1 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 2 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 3 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 4 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Viaarxiv icon

EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech Recognition on the Edge

Oct 18, 2018
Zhong Qiu Lin, Audrey G. Chung, Alexander Wong

Figure 1 for EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech Recognition on the Edge
Figure 2 for EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech Recognition on the Edge
Figure 3 for EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech Recognition on the Edge
Viaarxiv icon

Can we use Common Voice to train a Multi-Speaker TTS system?

Add code
Bookmark button
Alert button
Oct 12, 2022
Sewade Ogun, Vincent Colotte, Emmanuel Vincent

Figure 1 for Can we use Common Voice to train a Multi-Speaker TTS system?
Figure 2 for Can we use Common Voice to train a Multi-Speaker TTS system?
Figure 3 for Can we use Common Voice to train a Multi-Speaker TTS system?
Figure 4 for Can we use Common Voice to train a Multi-Speaker TTS system?
Viaarxiv icon

A context-aware knowledge transferring strategy for CTC-based ASR

Add code
Bookmark button
Alert button
Oct 12, 2022
Ke-Han Lu, Kuan-Yu Chen

Figure 1 for A context-aware knowledge transferring strategy for CTC-based ASR
Figure 2 for A context-aware knowledge transferring strategy for CTC-based ASR
Figure 3 for A context-aware knowledge transferring strategy for CTC-based ASR
Figure 4 for A context-aware knowledge transferring strategy for CTC-based ASR
Viaarxiv icon