Picture for Yuma Shirahata

Yuma Shirahata

Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations

Add code
Jun 18, 2026
Viaarxiv icon

PASQA: Pitch-Accent-Focused Speech Quality Assessment Model Trained on Synthetic Speech with Accent Errors

Add code
Jun 18, 2026
Viaarxiv icon

CC-G2PnP: Streaming Grapheme-to-Phoneme and prosody with Conformer-CTC for unsegmented languages

Add code
Feb 19, 2026
Viaarxiv icon

Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech Generation from SSL features

Add code
Feb 05, 2026
Viaarxiv icon

SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch

Add code
Jul 23, 2025
Viaarxiv icon

Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning

Add code
Jun 05, 2025
Viaarxiv icon

Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control

Add code
Sep 26, 2024
Viaarxiv icon

Universal Score-based Speech Enhancement with High Content Preservation

Add code
Jun 18, 2024
Figure 1 for Universal Score-based Speech Enhancement with High Content Preservation
Figure 2 for Universal Score-based Speech Enhancement with High Content Preservation
Figure 3 for Universal Score-based Speech Enhancement with High Content Preservation
Figure 4 for Universal Score-based Speech Enhancement with High Content Preservation
Viaarxiv icon

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

Add code
Jun 12, 2024
Figure 1 for LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Figure 2 for LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Figure 3 for LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Figure 4 for LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Viaarxiv icon

Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data

Add code
Jun 12, 2024
Figure 1 for Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data
Figure 2 for Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data
Figure 3 for Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data
Figure 4 for Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data
Viaarxiv icon