Alert button

"speech": models, code, and papers
Alert button

Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys

Feb 21, 2023
Henk van den Heuvel, Martijn Bentum, Simone Wills, Judith C. Koops

Figure 1 for Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys
Figure 2 for Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys
Figure 3 for Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys
Figure 4 for Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys
Viaarxiv icon

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Feb 27, 2023
Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari

Figure 1 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Figure 2 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Figure 3 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Figure 4 for Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Viaarxiv icon

Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision

Feb 07, 2023
Eugene Kharitonov, Damien Vincent, Zalán Borsos, Raphaël Marinier, Sertan Girgin, Olivier Pietquin, Matt Sharifi, Marco Tagliasacchi, Neil Zeghidour

Figure 1 for Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Figure 2 for Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Figure 3 for Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Figure 4 for Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Viaarxiv icon

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Mar 14, 2023
Yifan Peng, Jaesong Lee, Shinji Watanabe

Figure 1 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Figure 2 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Figure 3 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Figure 4 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Viaarxiv icon

Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies

Jun 22, 2023
Yuya Yamamoto

Figure 1 for Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
Figure 2 for Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
Figure 3 for Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
Figure 4 for Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
Viaarxiv icon

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

Jan 10, 2023
Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Jianhua Tao

Figure 1 for UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Figure 2 for UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Figure 3 for UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Figure 4 for UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Viaarxiv icon

Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems

Jun 13, 2023
Michele Panariello, Wanying Ge, Hemlata Tak, Massimiliano Todisco, Nicholas Evans

Figure 1 for Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems
Figure 2 for Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems
Figure 3 for Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems
Figure 4 for Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems
Viaarxiv icon

Detecting and Characterizing Political Incivility on Social Media

May 24, 2023
Sagi Penzel, Nir Lotan, Alon Zoizner, Einat Minkov

Figure 1 for Detecting and Characterizing Political Incivility on Social Media
Figure 2 for Detecting and Characterizing Political Incivility on Social Media
Figure 3 for Detecting and Characterizing Political Incivility on Social Media
Figure 4 for Detecting and Characterizing Political Incivility on Social Media
Viaarxiv icon

A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

Jan 13, 2023
Simbarashe Nyatsanga, Taras Kucherenko, Chaitanya Ahuja, Gustav Eje Henter, Michael Neff

Figure 1 for A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
Figure 2 for A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
Figure 3 for A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
Figure 4 for A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
Viaarxiv icon

Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data

Dec 04, 2022
Yuhao Zhang, Chen Xu, Bojie Hu, Chunliang Zhang, Tong Xiao, Jingbo Zhu

Figure 1 for Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
Figure 2 for Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
Figure 3 for Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
Figure 4 for Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
Viaarxiv icon