Alert button

"speech": models, code, and papers
Alert button

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Jul 13, 2023
Jari Kolehmainen, Yile Gu, Aditya Gourav, Prashanth Gurunath Shivakumar, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

Figure 1 for Personalization for BERT-based Discriminative Speech Recognition Rescoring
Figure 2 for Personalization for BERT-based Discriminative Speech Recognition Rescoring
Figure 3 for Personalization for BERT-based Discriminative Speech Recognition Rescoring
Figure 4 for Personalization for BERT-based Discriminative Speech Recognition Rescoring
Viaarxiv icon

Textually Pretrained Speech Language Models

May 22, 2023
Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi

Figure 1 for Textually Pretrained Speech Language Models
Figure 2 for Textually Pretrained Speech Language Models
Figure 3 for Textually Pretrained Speech Language Models
Figure 4 for Textually Pretrained Speech Language Models
Viaarxiv icon

Simulating room transfer functions between transducers mounted on audio devices using a modified image source method

Add code
Bookmark button
Alert button
Sep 07, 2023
Zeyu Xu, Adrian Herzog, Alexander Lodermeyer, Emanuël A. P. Habets, Albert G. Prinn

Viaarxiv icon

An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing

Add code
Bookmark button
Alert button
Jun 24, 2023
Lester Phillip Violeta, Tomoki Toda

Figure 1 for An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing
Figure 2 for An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing
Figure 3 for An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing
Figure 4 for An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing
Viaarxiv icon

Sparks of Large Audio Models: A Survey and Outlook

Add code
Bookmark button
Alert button
Sep 03, 2023
Siddique Latif, Moazzam Shoukat, Fahad Shamshad, Muhammad Usama, Yi Ren, Heriberto Cuayáhuitl, Wenwu Wang, Xulong Zhang, Roberto Togneri, Björn W. Schuller

Figure 1 for Sparks of Large Audio Models: A Survey and Outlook
Figure 2 for Sparks of Large Audio Models: A Survey and Outlook
Figure 3 for Sparks of Large Audio Models: A Survey and Outlook
Figure 4 for Sparks of Large Audio Models: A Survey and Outlook
Viaarxiv icon

Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources

Add code
Bookmark button
Alert button
Jun 14, 2023
Kunal Dhawan, Dima Rekesh, Boris Ginsburg

Figure 1 for Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources
Figure 2 for Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources
Figure 3 for Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources
Figure 4 for Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources
Viaarxiv icon

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Jun 05, 2023
Dengfeng Ke, Yayue Deng, Yukang Jia, Jinlong Xue, Qi Luo, Ya Li, Jianqing Sun, Jiaen Liang, Binghuai Lin

Figure 1 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 2 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 3 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 4 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Viaarxiv icon

Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition

May 19, 2023
Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

Figure 1 for Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Figure 2 for Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Figure 3 for Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Figure 4 for Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Viaarxiv icon

Neural approaches to spoken content embedding

Aug 28, 2023
Shane Settle

Figure 1 for Neural approaches to spoken content embedding
Figure 2 for Neural approaches to spoken content embedding
Figure 3 for Neural approaches to spoken content embedding
Figure 4 for Neural approaches to spoken content embedding
Viaarxiv icon

VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages

Add code
Bookmark button
Alert button
May 21, 2023
Shivam Mhaskar, Vineet Bhat, Akshay Batheja, Sourabh Deoghare, Paramveer Choudhary, Pushpak Bhattacharyya

Figure 1 for VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Figure 2 for VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Figure 3 for VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Figure 4 for VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Viaarxiv icon