Alert button

"speech": models, code, and papers
Alert button

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Add code
Bookmark button
Alert button
Jun 06, 2023
Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Yike Guo, Jie Fu

Figure 1 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Figure 2 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Figure 3 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Figure 4 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Viaarxiv icon

PLCMOS -- a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms

May 24, 2023
Lorenz Diener, Marju Purin, Sten Sootla, Ando Saabas, Robert Aichner, Ross Cutler

Figure 1 for PLCMOS -- a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms
Figure 2 for PLCMOS -- a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms
Figure 3 for PLCMOS -- a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms
Figure 4 for PLCMOS -- a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms
Viaarxiv icon

Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech

Add code
Bookmark button
Alert button
Feb 11, 2023
Fan Huang, Haewoon Kwak, Jisun An

Figure 1 for Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech
Figure 2 for Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech
Figure 3 for Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech
Viaarxiv icon

Skit-S2I: An Indian Accented Speech to Intent dataset

Add code
Bookmark button
Alert button
Dec 26, 2022
Shangeth Rajaa, Swaraj Dalmia, Kumarmanas Nethil

Figure 1 for Skit-S2I: An Indian Accented Speech to Intent dataset
Figure 2 for Skit-S2I: An Indian Accented Speech to Intent dataset
Figure 3 for Skit-S2I: An Indian Accented Speech to Intent dataset
Figure 4 for Skit-S2I: An Indian Accented Speech to Intent dataset
Viaarxiv icon

Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

Jan 06, 2023
David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

Figure 1 for Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
Figure 2 for Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
Figure 3 for Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
Figure 4 for Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
Viaarxiv icon

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models

Add code
Bookmark button
Alert button
Jun 12, 2023
Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee

Figure 1 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Figure 2 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Figure 3 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Figure 4 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Viaarxiv icon

The Ethical Implications of Generative Audio Models: A Systematic Literature Review

Jul 07, 2023
Julia Barnett

Viaarxiv icon

EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels

Add code
Bookmark button
Alert button
May 22, 2023
Kari Ali Noriy, Xiaosong Yang, Jian Jun Zhang

Figure 1 for EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels
Figure 2 for EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels
Figure 3 for EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels
Figure 4 for EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels
Viaarxiv icon

Joint Acoustic Echo Cancellation and Speech Dereverberation Using Kalman filters

Add code
Bookmark button
Alert button
Feb 09, 2023
Ziteng Wang, Yueyue Na, Biao Tian, Qiang Fu

Figure 1 for Joint Acoustic Echo Cancellation and Speech Dereverberation Using Kalman filters
Figure 2 for Joint Acoustic Echo Cancellation and Speech Dereverberation Using Kalman filters
Figure 3 for Joint Acoustic Echo Cancellation and Speech Dereverberation Using Kalman filters
Figure 4 for Joint Acoustic Echo Cancellation and Speech Dereverberation Using Kalman filters
Viaarxiv icon

Analysing the Masked predictive coding training criterion for pre-training a Speech Representation Model

Add code
Bookmark button
Alert button
Mar 13, 2023
Hemant Yadav, Sunayana Sitaram, Rajiv Ratn Shah

Figure 1 for Analysing the Masked predictive coding training criterion for pre-training a Speech Representation Model
Figure 2 for Analysing the Masked predictive coding training criterion for pre-training a Speech Representation Model
Viaarxiv icon