Picture for Hiroshi Saruwatari

Hiroshi Saruwatari

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Add code
Apr 06, 2024
Viaarxiv icon

Building speech corpus with diverse voice characteristics for its prompt-based representation

Add code
Mar 20, 2024
Viaarxiv icon

Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation

Add code
Mar 19, 2024
Viaarxiv icon

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Add code
Jan 30, 2024
Viaarxiv icon

Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression

Add code
Jan 11, 2024
Viaarxiv icon

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions

Add code
Oct 09, 2023
Viaarxiv icon

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

Add code
Sep 24, 2023
Figure 1 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Figure 2 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Figure 3 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Figure 4 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Viaarxiv icon

Do learned speech symbols follow Zipf's law?

Add code
Sep 18, 2023
Viaarxiv icon

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

Add code
Sep 15, 2023
Figure 1 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Figure 2 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Figure 3 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Figure 4 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Viaarxiv icon

Kernel Interpolation of Incident Sound Field in Region Including Scattering Objects

Add code
Sep 11, 2023
Figure 1 for Kernel Interpolation of Incident Sound Field in Region Including Scattering Objects
Figure 2 for Kernel Interpolation of Incident Sound Field in Region Including Scattering Objects
Figure 3 for Kernel Interpolation of Incident Sound Field in Region Including Scattering Objects
Figure 4 for Kernel Interpolation of Incident Sound Field in Region Including Scattering Objects
Viaarxiv icon