Alert button

"speech": models, code, and papers
Alert button

Continuous descriptor-based control for deep audio synthesis

Add code
Bookmark button
Alert button
Feb 27, 2023
Ninon Devis, Nils Demerlé, Sarah Nabi, David Genova, Philippe Esling

Figure 1 for Continuous descriptor-based control for deep audio synthesis
Figure 2 for Continuous descriptor-based control for deep audio synthesis
Figure 3 for Continuous descriptor-based control for deep audio synthesis
Figure 4 for Continuous descriptor-based control for deep audio synthesis
Viaarxiv icon

NatiQ: An End-to-end Text-to-Speech System for Arabic

Jun 15, 2022
Ahmed Abdelali, Nadir Durrani, Cenk Demiroglu, Fahim Dalvi, Hamdy Mubarak, Kareem Darwish

Figure 1 for NatiQ: An End-to-end Text-to-Speech System for Arabic
Figure 2 for NatiQ: An End-to-end Text-to-Speech System for Arabic
Figure 3 for NatiQ: An End-to-end Text-to-Speech System for Arabic
Figure 4 for NatiQ: An End-to-end Text-to-Speech System for Arabic
Viaarxiv icon

Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints

Add code
Bookmark button
Alert button
Mar 03, 2023
Paul Magron, Tuomas Virtanen

Figure 1 for Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints
Figure 2 for Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints
Figure 3 for Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints
Viaarxiv icon

Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation

Mar 30, 2022
Kuan Po Huang, Yu-Kuan Fu, Yu Zhang, Hung-yi Lee

Figure 1 for Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation
Figure 2 for Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation
Viaarxiv icon

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation

Jul 29, 2022
Giulia Comini, Goeric Huybrechts, Manuel Sam Ribeiro, Adam Gabrys, Jaime Lorenzo-Trueba

Figure 1 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Figure 2 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Figure 3 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Figure 4 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Viaarxiv icon

PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis

Jun 24, 2022
Kubilay Can Demir, Matthias May, Axel Schmid, Michael Uder, Katharina Breininger, Tobias Weise, Andreas Maier, Seung Hee Yang

Figure 1 for PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis
Figure 2 for PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis
Figure 3 for PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis
Figure 4 for PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis
Viaarxiv icon

Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

Oct 26, 2022
Sharman Tan, Piyush Behre, Nick Kibre, Issac Alphonso, Shuangyu Chang

Figure 1 for Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition
Figure 2 for Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition
Figure 3 for Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition
Figure 4 for Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition
Viaarxiv icon

Wavebender GAN: An architecture for phonetically meaningful speech manipulation

Add code
Bookmark button
Alert button
Feb 22, 2022
Gustavo Teodoro Döhler Beck, Ulme Wennberg, Zofia Malisz, Gustav Eje Henter

Figure 1 for Wavebender GAN: An architecture for phonetically meaningful speech manipulation
Figure 2 for Wavebender GAN: An architecture for phonetically meaningful speech manipulation
Figure 3 for Wavebender GAN: An architecture for phonetically meaningful speech manipulation
Figure 4 for Wavebender GAN: An architecture for phonetically meaningful speech manipulation
Viaarxiv icon

On incorporating social speaker characteristics in synthetic speech

Apr 03, 2022
Sai Sirisha Rallabandi, Sebastian Möller

Figure 1 for On incorporating social speaker characteristics in synthetic speech
Figure 2 for On incorporating social speaker characteristics in synthetic speech
Figure 3 for On incorporating social speaker characteristics in synthetic speech
Figure 4 for On incorporating social speaker characteristics in synthetic speech
Viaarxiv icon

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre

Add code
Bookmark button
Alert button
Jun 29, 2022
Guangyan Zhang, Ying Qin, Wenjie Zhang, Jialun Wu, Mei Li, Yutao Gai, Feijun Jiang, Tan Lee

Figure 1 for iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Figure 2 for iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Figure 3 for iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Figure 4 for iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Viaarxiv icon