Alert button

"speech": models, code, and papers
Alert button

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

Add code
Bookmark button
Alert button
May 31, 2023
Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

Figure 1 for XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
Figure 2 for XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
Figure 3 for XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
Figure 4 for XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
Viaarxiv icon

Evaluation of Speech Representations for MOS prediction

Add code
Bookmark button
Alert button
Jun 16, 2023
Frederico S. Oliveira, Edresson Casanova, Arnaldo Cândido Júnior, Lucas R. S. Gris, Anderson S. Soares, Arlindo R. Galvão Filho

Figure 1 for Evaluation of Speech Representations for MOS prediction
Figure 2 for Evaluation of Speech Representations for MOS prediction
Figure 3 for Evaluation of Speech Representations for MOS prediction
Figure 4 for Evaluation of Speech Representations for MOS prediction
Viaarxiv icon

Speech Intelligibility Classifiers from 550k Disordered Speech Samples

Mar 15, 2023
Subhashini Venugopalan, Jimmy Tobin, Samuel J. Yang, Katie Seaver, Richard J. N. Cave, Pan-Pan Jiang, Neil Zeghidour, Rus Heywood, Jordan Green, Michael P. Brenner

Figure 1 for Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Figure 2 for Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Figure 3 for Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Figure 4 for Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Viaarxiv icon

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Jun 28, 2023
Yuang Li, Yu Wu, Jinyu Li, Shujie Liu

Figure 1 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Figure 2 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Figure 3 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Figure 4 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Viaarxiv icon

Employing Hybrid Deep Neural Networks on Dari Speech

May 04, 2023
Jawid Ahmad Baktash, Mursal Dawodi

Figure 1 for Employing Hybrid Deep Neural Networks on Dari Speech
Figure 2 for Employing Hybrid Deep Neural Networks on Dari Speech
Figure 3 for Employing Hybrid Deep Neural Networks on Dari Speech
Figure 4 for Employing Hybrid Deep Neural Networks on Dari Speech
Viaarxiv icon

MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization

Add code
Bookmark button
Alert button
May 30, 2023
Victoria Y. H. Chua, Hexin Liu, Leibny Paola Garcia Perera, Fei Ting Woon, Jinyi Wong, Xiangyu Zhang, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles

Figure 1 for MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization
Figure 2 for MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization
Figure 3 for MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization
Figure 4 for MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization
Viaarxiv icon

Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset

Aug 29, 2023
Mustafa Eyceoz, Justin Lee, Siddharth Pittie, Homayoon Beigi

Figure 1 for Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset
Figure 2 for Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset
Figure 3 for Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset
Figure 4 for Robust Open-Set Spoken Language Identification and the CU MultiLang Dataset
Viaarxiv icon

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

Aug 11, 2023
Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao

Figure 1 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Figure 2 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Figure 3 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Figure 4 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Viaarxiv icon

Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks

Add code
Bookmark button
Alert button
May 08, 2023
Souvick Ghosh, Satanu Ghosh, Chirag Shah

Figure 1 for Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks
Figure 2 for Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks
Figure 3 for Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks
Figure 4 for Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks
Viaarxiv icon

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

May 04, 2023
Yun Tang, Anna Y. Sun, Hirofumi Inaguma, Xinyue Chen, Ning Dong, Xutai Ma, Paden D. Tomasello, Juan Pino

Figure 1 for Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks
Figure 2 for Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks
Figure 3 for Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks
Figure 4 for Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks
Viaarxiv icon