Picture for Manuel Sam Ribeiro

Manuel Sam Ribeiro

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Add code
Jul 31, 2023
Figure 1 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 2 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 3 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 4 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Viaarxiv icon

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

Add code
Jul 31, 2023
Figure 1 for Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings
Figure 2 for Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings
Figure 3 for Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings
Figure 4 for Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings
Viaarxiv icon

Multilingual context-based pronunciation learning for Text-to-Speech

Add code
Jul 31, 2023
Figure 1 for Multilingual context-based pronunciation learning for Text-to-Speech
Figure 2 for Multilingual context-based pronunciation learning for Text-to-Speech
Figure 3 for Multilingual context-based pronunciation learning for Text-to-Speech
Figure 4 for Multilingual context-based pronunciation learning for Text-to-Speech
Viaarxiv icon

Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks

Add code
Sep 22, 2022
Figure 1 for Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Figure 2 for Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Figure 3 for Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Figure 4 for Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Viaarxiv icon

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation

Add code
Jul 29, 2022
Figure 1 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Figure 2 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Figure 3 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Figure 4 for Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Viaarxiv icon

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

Add code
Feb 16, 2022
Figure 1 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 2 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 3 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Figure 4 for Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module
Viaarxiv icon

Cross-speaker style transfer for text-to-speech using data augmentation

Add code
Feb 10, 2022
Figure 1 for Cross-speaker style transfer for text-to-speech using data augmentation
Figure 2 for Cross-speaker style transfer for text-to-speech using data augmentation
Figure 3 for Cross-speaker style transfer for text-to-speech using data augmentation
Figure 4 for Cross-speaker style transfer for text-to-speech using data augmentation
Viaarxiv icon

Automatic audiovisual synchronisation for ultrasound tongue imaging

Add code
May 31, 2021
Figure 1 for Automatic audiovisual synchronisation for ultrasound tongue imaging
Figure 2 for Automatic audiovisual synchronisation for ultrasound tongue imaging
Figure 3 for Automatic audiovisual synchronisation for ultrasound tongue imaging
Figure 4 for Automatic audiovisual synchronisation for ultrasound tongue imaging
Viaarxiv icon

Silent versus modal multi-speaker speech recognition from ultrasound and video

Add code
Feb 27, 2021
Figure 1 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Figure 2 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Figure 3 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Figure 4 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Viaarxiv icon

Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors

Add code
Feb 27, 2021
Figure 1 for Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors
Figure 2 for Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors
Figure 3 for Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors
Figure 4 for Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors
Viaarxiv icon