Picture for Aimilios Chalamandaris

Aimilios Chalamandaris

MambaRate: Speech Quality Assessment Across Different Sampling Rates

Add code
Jul 16, 2025
Viaarxiv icon

Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

Add code
Apr 02, 2024
Figure 1 for Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification
Figure 2 for Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification
Figure 3 for Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification
Figure 4 for Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification
Viaarxiv icon

Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations

Add code
Feb 02, 2024
Figure 1 for Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
Figure 2 for Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
Figure 3 for Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
Figure 4 for Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
Viaarxiv icon

Controllable speech synthesis by learning discrete phoneme-level prosodic representations

Add code
Nov 29, 2022
Figure 1 for Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Figure 2 for Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Figure 3 for Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Figure 4 for Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Viaarxiv icon

Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis

Add code
Nov 02, 2022
Figure 1 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Figure 2 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Figure 3 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Figure 4 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Viaarxiv icon

Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features

Add code
Nov 01, 2022
Figure 1 for Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features
Figure 2 for Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features
Figure 3 for Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features
Figure 4 for Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features
Viaarxiv icon

Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

Add code
Nov 01, 2022
Figure 1 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Figure 2 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Figure 3 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Figure 4 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Viaarxiv icon

Generating Gender-Ambiguous Text-to-Speech Voices

Add code
Nov 01, 2022
Figure 1 for Generating Gender-Ambiguous Text-to-Speech Voices
Figure 2 for Generating Gender-Ambiguous Text-to-Speech Voices
Figure 3 for Generating Gender-Ambiguous Text-to-Speech Voices
Figure 4 for Generating Gender-Ambiguous Text-to-Speech Voices
Viaarxiv icon

Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation

Add code
Oct 31, 2022
Figure 1 for Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Figure 2 for Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Figure 3 for Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Figure 4 for Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
Viaarxiv icon

Fine-grained Noise Control for Multispeaker Speech Synthesis

Add code
Apr 11, 2022
Figure 1 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 2 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 3 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 4 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Viaarxiv icon