Picture for Korin Richmond

Korin Richmond

CSTR

Rethinking Discrete Speech Representation Tokens for Accent Generation

Add code
Jan 27, 2026
Viaarxiv icon

Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information

Add code
May 21, 2025
Viaarxiv icon

Pairwise Evaluation of Accent Similarity in Speech Synthesis

Add code
May 20, 2025
Figure 1 for Pairwise Evaluation of Accent Similarity in Speech Synthesis
Figure 2 for Pairwise Evaluation of Accent Similarity in Speech Synthesis
Figure 3 for Pairwise Evaluation of Accent Similarity in Speech Synthesis
Figure 4 for Pairwise Evaluation of Accent Similarity in Speech Synthesis
Viaarxiv icon

Revisiting Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations

Add code
Sep 26, 2024
Figure 1 for Revisiting Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations
Figure 2 for Revisiting Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations
Figure 3 for Revisiting Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations
Figure 4 for Revisiting Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations
Viaarxiv icon

Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models

Add code
Sep 25, 2024
Figure 1 for Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
Figure 2 for Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
Figure 3 for Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
Figure 4 for Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
Viaarxiv icon

Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning

Add code
Sep 15, 2024
Figure 1 for Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
Figure 2 for Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
Figure 3 for Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
Figure 4 for Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
Viaarxiv icon

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation

Add code
Sep 13, 2024
Figure 1 for AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
Figure 2 for AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
Figure 3 for AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
Figure 4 for AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
Viaarxiv icon

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

Add code
Jun 13, 2024
Viaarxiv icon

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Add code
Dec 22, 2023
Figure 1 for ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Figure 2 for ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Figure 3 for ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Figure 4 for ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Viaarxiv icon

Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks

Add code
Sep 22, 2022
Figure 1 for Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Figure 2 for Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Figure 3 for Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Figure 4 for Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Viaarxiv icon