Alert button

"speech": models, code, and papers
Alert button

A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition

Add code
Bookmark button
Alert button
Apr 22, 2023
Orchid Chetia Phukan, Arun Balaji Buduru, Rajesh Sharma

Figure 1 for A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Figure 2 for A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Figure 3 for A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Figure 4 for A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Viaarxiv icon

Distillation Strategies for Discriminative Speech Recognition Rescoring

Jun 15, 2023
Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

Figure 1 for Distillation Strategies for Discriminative Speech Recognition Rescoring
Figure 2 for Distillation Strategies for Discriminative Speech Recognition Rescoring
Figure 3 for Distillation Strategies for Discriminative Speech Recognition Rescoring
Figure 4 for Distillation Strategies for Discriminative Speech Recognition Rescoring
Viaarxiv icon

a unified front-end framework for english text-to-speech synthesis

May 18, 2023
Zelin Ying, Chen Li, Yu Dong, Qiuqiang Kong, YuanYuan Huo, Yuping Wang, Yuxuan Wang

Figure 1 for a unified front-end framework for english text-to-speech synthesis
Figure 2 for a unified front-end framework for english text-to-speech synthesis
Figure 3 for a unified front-end framework for english text-to-speech synthesis
Figure 4 for a unified front-end framework for english text-to-speech synthesis
Viaarxiv icon

Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge

Add code
Bookmark button
Alert button
Jun 07, 2023
Wenhao Guan, Tao Li, Yishuang Li, Hukai Huang, Qingyang Hong, Lin Li

Figure 1 for Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Figure 2 for Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Figure 3 for Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Figure 4 for Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Viaarxiv icon

CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center

Add code
Bookmark button
Alert button
May 23, 2023
Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari

Figure 1 for CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center
Figure 2 for CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center
Figure 3 for CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center
Figure 4 for CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center
Viaarxiv icon

FonMTL: Towards Multitask Learning for the Fon Language

Add code
Bookmark button
Alert button
Sep 11, 2023
Bonaventure F. P. Dossou, Iffanice Houndayi, Pamely Zantou, Gilles Hacheme

Figure 1 for FonMTL: Towards Multitask Learning for the Fon Language
Figure 2 for FonMTL: Towards Multitask Learning for the Fon Language
Figure 3 for FonMTL: Towards Multitask Learning for the Fon Language
Figure 4 for FonMTL: Towards Multitask Learning for the Fon Language
Viaarxiv icon

Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili

Jun 01, 2023
Christiaan Jacobs, Nathanaël Carraz Rakotonirina, Everlyn Asiko Chimoto, Bruce A. Bassett, Herman Kamper

Figure 1 for Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
Figure 2 for Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
Figure 3 for Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
Figure 4 for Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
Viaarxiv icon

NeuroHeed: Neuro-Steered Speaker Extraction using EEG Signals

Add code
Bookmark button
Alert button
Jul 26, 2023
Zexu Pan, Marvin Borsdorf, Siqi Cai, Tanja Schultz, Haizhou Li

Figure 1 for NeuroHeed: Neuro-Steered Speaker Extraction using EEG Signals
Figure 2 for NeuroHeed: Neuro-Steered Speaker Extraction using EEG Signals
Figure 3 for NeuroHeed: Neuro-Steered Speaker Extraction using EEG Signals
Figure 4 for NeuroHeed: Neuro-Steered Speaker Extraction using EEG Signals
Viaarxiv icon

Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation

Add code
Bookmark button
Alert button
May 17, 2023
Zhenxing Zhang, Lambert Schomaker

Figure 1 for Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation
Figure 2 for Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation
Figure 3 for Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation
Figure 4 for Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation
Viaarxiv icon

Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

Add code
Bookmark button
Alert button
Aug 16, 2023
Franziska Braun, Sebastian P. Bayerl, Paula A. Pérez-Toro, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

Figure 1 for Classifying Dementia in the Presence of Depression: A Cross-Corpus Study
Figure 2 for Classifying Dementia in the Presence of Depression: A Cross-Corpus Study
Viaarxiv icon