Picture for Kenichi Fujita

Kenichi Fujita

Lightweight Zero-shot Text-to-Speech with Mixture of Adapters

Add code
Jul 01, 2024
Figure 1 for Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
Figure 2 for Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
Figure 3 for Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
Figure 4 for Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
Viaarxiv icon

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

Add code
Feb 11, 2024
Figure 1 for Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
Figure 2 for Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
Figure 3 for Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
Figure 4 for Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
Viaarxiv icon

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters

Add code
Jan 10, 2024
Figure 1 for Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Figure 2 for Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Figure 3 for Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Figure 4 for Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
Viaarxiv icon

Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model

Add code
Apr 24, 2023
Viaarxiv icon