Alert button

"speech": models, code, and papers
Alert button

Efficient neural speech synthesis for low-resource languages through multilingual modeling

Aug 20, 2020
Marcel de Korte, Jaebok Kim, Esther Klabbers

Figure 1 for Efficient neural speech synthesis for low-resource languages through multilingual modeling
Figure 2 for Efficient neural speech synthesis for low-resource languages through multilingual modeling
Figure 3 for Efficient neural speech synthesis for low-resource languages through multilingual modeling
Viaarxiv icon

A Practical Guide to Logical Access Voice Presentation Attack Detection

Add code
Bookmark button
Alert button
Jan 10, 2022
Xin Wang, Junichi Yamagishi

Figure 1 for A Practical Guide to Logical Access Voice Presentation Attack Detection
Figure 2 for A Practical Guide to Logical Access Voice Presentation Attack Detection
Figure 3 for A Practical Guide to Logical Access Voice Presentation Attack Detection
Figure 4 for A Practical Guide to Logical Access Voice Presentation Attack Detection
Viaarxiv icon

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Add code
Bookmark button
Alert button
Oct 08, 2020
Hieu-Thi Luong, Junichi Yamagishi

Figure 1 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion
Figure 2 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion
Figure 3 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion
Figure 4 for Latent linguistic embedding for cross-lingual text-to-speech and voice conversion
Viaarxiv icon

Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition

Add code
Bookmark button
Alert button
Jun 22, 2021
Weidong Chen, Xiaofeng Xing, Xiangmin Xu, Jichen Yang, Jianxin Pang

Figure 1 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Figure 2 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Figure 3 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Figure 4 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Viaarxiv icon

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Add code
Bookmark button
Alert button
Jun 20, 2020
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli

Figure 1 for wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Figure 2 for wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Figure 3 for wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Figure 4 for wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Viaarxiv icon

Nonlinear ISA with Auxiliary Variables for Learning Speech Representations

Jul 25, 2020
Amrith Setlur, Barnabas Poczos, Alan W Black

Figure 1 for Nonlinear ISA with Auxiliary Variables for Learning Speech Representations
Figure 2 for Nonlinear ISA with Auxiliary Variables for Learning Speech Representations
Figure 3 for Nonlinear ISA with Auxiliary Variables for Learning Speech Representations
Viaarxiv icon

Two-stage Textual Knowledge Distillation to Speech Encoder for Spoken Language Understanding

Add code
Bookmark button
Alert button
Oct 25, 2020
Seongbin Kim, Gyuwan Kim, Seongjin Shin, Sangmin Lee

Figure 1 for Two-stage Textual Knowledge Distillation to Speech Encoder for Spoken Language Understanding
Figure 2 for Two-stage Textual Knowledge Distillation to Speech Encoder for Spoken Language Understanding
Figure 3 for Two-stage Textual Knowledge Distillation to Speech Encoder for Spoken Language Understanding
Figure 4 for Two-stage Textual Knowledge Distillation to Speech Encoder for Spoken Language Understanding
Viaarxiv icon

Deep F-measure Maximization for End-to-End Speech Understanding

Aug 08, 2020
Leda Sarı, Mark Hasegawa-Johnson

Figure 1 for Deep F-measure Maximization for End-to-End Speech Understanding
Figure 2 for Deep F-measure Maximization for End-to-End Speech Understanding
Figure 3 for Deep F-measure Maximization for End-to-End Speech Understanding
Figure 4 for Deep F-measure Maximization for End-to-End Speech Understanding
Viaarxiv icon

CRAB: Class Representation Attentive BERT for Hate Speech Identification in Social Media

Oct 25, 2020
Sayyed M. Zahiri, Ali Ahmadvand

Figure 1 for CRAB: Class Representation Attentive BERT for Hate Speech Identification in Social Media
Figure 2 for CRAB: Class Representation Attentive BERT for Hate Speech Identification in Social Media
Viaarxiv icon

Audio Self-supervised Learning: A Survey

Add code
Bookmark button
Alert button
Mar 02, 2022
Shuo Liu, Adria Mallol-Ragolta, Emilia Parada-Cabeleiro, Kun Qian, Xin Jing, Alexander Kathan, Bin Hu, Bjoern W. Schuller

Figure 1 for Audio Self-supervised Learning: A Survey
Figure 2 for Audio Self-supervised Learning: A Survey
Figure 3 for Audio Self-supervised Learning: A Survey
Figure 4 for Audio Self-supervised Learning: A Survey
Viaarxiv icon