Picture for Hiroshi Saruwatari

Hiroshi Saruwatari

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

Add code
Apr 05, 2022
Figure 1 for UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Figure 2 for UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Figure 3 for UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Figure 4 for UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Viaarxiv icon

STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent

Add code
Mar 28, 2022
Figure 1 for STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent
Figure 2 for STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent
Figure 3 for STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent
Figure 4 for STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent
Viaarxiv icon

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

Add code
Mar 24, 2022
Figure 1 for SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
Figure 2 for SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
Figure 3 for SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
Figure 4 for SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling
Viaarxiv icon

Personalized filled-pause generation with group-wise prediction models

Add code
Mar 18, 2022
Figure 1 for Personalized filled-pause generation with group-wise prediction models
Figure 2 for Personalized filled-pause generation with group-wise prediction models
Figure 3 for Personalized filled-pause generation with group-wise prediction models
Figure 4 for Personalized filled-pause generation with group-wise prediction models
Viaarxiv icon

Spatial active noise control based on individual kernel interpolation of primary and secondary sound fields

Add code
Feb 10, 2022
Figure 1 for Spatial active noise control based on individual kernel interpolation of primary and secondary sound fields
Figure 2 for Spatial active noise control based on individual kernel interpolation of primary and secondary sound fields
Figure 3 for Spatial active noise control based on individual kernel interpolation of primary and secondary sound fields
Figure 4 for Spatial active noise control based on individual kernel interpolation of primary and secondary sound fields
Viaarxiv icon

Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds

Add code
Feb 01, 2022
Figure 1 for Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds
Figure 2 for Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds
Figure 3 for Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds
Viaarxiv icon

J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis

Add code
Jan 26, 2022
Figure 1 for J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
Figure 2 for J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
Figure 3 for J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
Figure 4 for J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
Viaarxiv icon

Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field

Add code
Dec 10, 2021
Figure 1 for Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field
Figure 2 for Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field
Figure 3 for Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field
Figure 4 for Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field
Viaarxiv icon

Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations

Add code
Oct 12, 2021
Figure 1 for Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations
Figure 2 for Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations
Figure 3 for Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations
Viaarxiv icon

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network

Add code
Sep 22, 2021
Figure 1 for Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Figure 2 for Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Figure 3 for Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Figure 4 for Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Viaarxiv icon