Alert button

"speech recognition": models, code, and papers
Alert button

A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU

Jun 01, 2023
Farhad Mortezapour Shiri, Thinagaran Perumal, Norwati Mustapha, Raihani Mohamed

Figure 1 for A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU
Figure 2 for A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU
Figure 3 for A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU
Figure 4 for A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU
Viaarxiv icon

Joint Speech Recognition and Audio Captioning

Add code
Bookmark button
Alert button
Feb 03, 2022
Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe

Figure 1 for Joint Speech Recognition and Audio Captioning
Figure 2 for Joint Speech Recognition and Audio Captioning
Figure 3 for Joint Speech Recognition and Audio Captioning
Figure 4 for Joint Speech Recognition and Audio Captioning
Viaarxiv icon

Enhancing Speech Recognition Decoding via Layer Aggregation

Add code
Bookmark button
Alert button
Apr 05, 2022
Tomer Wullach, Shlomo E. Chazan

Figure 1 for Enhancing Speech Recognition Decoding via Layer Aggregation
Figure 2 for Enhancing Speech Recognition Decoding via Layer Aggregation
Figure 3 for Enhancing Speech Recognition Decoding via Layer Aggregation
Figure 4 for Enhancing Speech Recognition Decoding via Layer Aggregation
Viaarxiv icon

Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition

Add code
Bookmark button
Alert button
May 29, 2023
Xiaoliang Wu, Peter Bell, Ajitha Rajan

Figure 1 for Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition
Figure 2 for Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition
Figure 3 for Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition
Figure 4 for Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition
Viaarxiv icon

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Add code
Bookmark button
Alert button
May 29, 2023
Guan-Wei Wu, Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee

Figure 1 for Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target
Figure 2 for Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target
Figure 3 for Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target
Figure 4 for Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target
Viaarxiv icon

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Nov 01, 2022
Zili Huang, Desh Raj, Paola García, Sanjeev Khudanpur

Figure 1 for Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Figure 2 for Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Figure 3 for Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Figure 4 for Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Viaarxiv icon

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

Add code
Bookmark button
Alert button
May 23, 2023
Tian-Hao Zhang, Hai-Bo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian, Xu-Cheng Yin

Figure 1 for Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Figure 2 for Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Figure 3 for Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Figure 4 for Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Viaarxiv icon

Regularizing Contrastive Predictive Coding for Speech Applications

Apr 12, 2023
Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

Figure 1 for Regularizing Contrastive Predictive Coding for Speech Applications
Figure 2 for Regularizing Contrastive Predictive Coding for Speech Applications
Figure 3 for Regularizing Contrastive Predictive Coding for Speech Applications
Figure 4 for Regularizing Contrastive Predictive Coding for Speech Applications
Viaarxiv icon

Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition

Oct 27, 2022
Steven Vander Eeckt, Hugo Van hamme

Figure 1 for Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition
Figure 2 for Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition
Viaarxiv icon

DWFormer: Dynamic Window transFormer for Speech Emotion Recognition

Add code
Bookmark button
Alert button
Mar 03, 2023
Shuaiqi Chen, Xiaofen Xing, Weibin Zhang, Weidong Chen, Xiangmin Xu

Figure 1 for DWFormer: Dynamic Window transFormer for Speech Emotion Recognition
Figure 2 for DWFormer: Dynamic Window transFormer for Speech Emotion Recognition
Figure 3 for DWFormer: Dynamic Window transFormer for Speech Emotion Recognition
Figure 4 for DWFormer: Dynamic Window transFormer for Speech Emotion Recognition
Viaarxiv icon