Alert button

"speech": models, code, and papers
Alert button

HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

Add code
Bookmark button
Alert button
Sep 15, 2023
Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-Jin Yu

Figure 1 for HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Figure 2 for HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Figure 3 for HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Figure 4 for HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Viaarxiv icon

Considerations for Ethical Speech Recognition Datasets

May 03, 2023
Orestis Papakyriakopoulos, Alice Xiang

Viaarxiv icon

Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody

Add code
Bookmark button
Alert button
Jun 16, 2023
Sofoklis Kakouros, Juraj Šimko, Martti Vainio, Antti Suni

Figure 1 for Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Figure 2 for Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Figure 3 for Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Figure 4 for Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Viaarxiv icon

Better speech synthesis through scaling

Add code
Bookmark button
Alert button
May 12, 2023
James Betker

Figure 1 for Better speech synthesis through scaling
Figure 2 for Better speech synthesis through scaling
Figure 3 for Better speech synthesis through scaling
Figure 4 for Better speech synthesis through scaling
Viaarxiv icon

MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting

Add code
Bookmark button
Alert button
May 19, 2023
Neil Shah, Vishal Tambrahalli, Saiteja Kosgi, Niranjan Pedanekar, Vineet Gandhi

Figure 1 for MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Figure 2 for MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Figure 3 for MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Figure 4 for MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Viaarxiv icon

HTEC: Human Transcription Error Correction

Sep 18, 2023
Hanbo Sun, Jian Gao, Xiaomin Wu, Anjie Fang, Cheng Cao, Zheng Du

Figure 1 for HTEC: Human Transcription Error Correction
Figure 2 for HTEC: Human Transcription Error Correction
Figure 3 for HTEC: Human Transcription Error Correction
Figure 4 for HTEC: Human Transcription Error Correction
Viaarxiv icon

OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment

Add code
Bookmark button
Alert button
Jun 10, 2023
Xize Cheng, Tao Jin, Linjun Li, Wang Lin, Xinyu Duan, Zhou Zhao

Figure 1 for OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Figure 2 for OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Figure 3 for OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Figure 4 for OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Viaarxiv icon

A Review of Deep Learning Techniques for Speech Processing

May 02, 2023
Ambuj Mehrish, Navonil Majumder, Rishabh Bhardwaj, Rada Mihalcea, Soujanya Poria

Figure 1 for A Review of Deep Learning Techniques for Speech Processing
Figure 2 for A Review of Deep Learning Techniques for Speech Processing
Figure 3 for A Review of Deep Learning Techniques for Speech Processing
Figure 4 for A Review of Deep Learning Techniques for Speech Processing
Viaarxiv icon

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

Add code
Bookmark button
Alert button
Jun 15, 2023
Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen

Figure 1 for Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Figure 2 for Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Figure 3 for Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Figure 4 for Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Viaarxiv icon

Recovering implicit pitch contours from formants in whispered speech

Jul 06, 2023
Pablo Pérez Zarazaga, Zofia Malisz

Figure 1 for Recovering implicit pitch contours from formants in whispered speech
Figure 2 for Recovering implicit pitch contours from formants in whispered speech
Figure 3 for Recovering implicit pitch contours from formants in whispered speech
Figure 4 for Recovering implicit pitch contours from formants in whispered speech
Viaarxiv icon