Picture for Hiroshi Saruwatari

Hiroshi Saruwatari

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals

Add code
Jun 25, 2024
Viaarxiv icon

Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment

Add code
Jun 11, 2024
Figure 1 for Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
Figure 2 for Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
Figure 3 for Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
Figure 4 for Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
Viaarxiv icon

SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark

Add code
Jun 11, 2024
Viaarxiv icon

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Add code
Apr 06, 2024
Viaarxiv icon

Building speech corpus with diverse voice characteristics for its prompt-based representation

Add code
Mar 20, 2024
Figure 1 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Figure 2 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Figure 3 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Figure 4 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Viaarxiv icon

Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation

Add code
Mar 19, 2024
Figure 1 for Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation
Figure 2 for Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation
Figure 3 for Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation
Viaarxiv icon

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Add code
Jan 30, 2024
Viaarxiv icon

Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression

Add code
Jan 11, 2024
Viaarxiv icon

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions

Add code
Oct 09, 2023
Figure 1 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Figure 2 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Figure 3 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Figure 4 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Viaarxiv icon

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

Add code
Sep 24, 2023
Figure 1 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Figure 2 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Figure 3 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Figure 4 for Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Viaarxiv icon