Picture for Yusuke Yasuda

Yusuke Yasuda

SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment

Add code
Mar 26, 2026
Viaarxiv icon

The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion To Singing Style Conversion

Add code
Sep 19, 2025
Viaarxiv icon

Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment

Add code
Mar 10, 2024
Figure 1 for Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment
Figure 2 for Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment
Figure 3 for Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment
Figure 4 for Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment
Viaarxiv icon

Preference-based training framework for automatic speech quality assessment using deep neural network

Add code
Aug 29, 2023
Viaarxiv icon

Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Add code
Dec 16, 2022
Figure 1 for Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Figure 2 for Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Figure 3 for Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Figure 4 for Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Viaarxiv icon

Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder

Add code
Dec 16, 2022
Viaarxiv icon

ESPnet2-TTS: Extending the Edge of TTS Research

Add code
Oct 15, 2021
Figure 1 for ESPnet2-TTS: Extending the Edge of TTS Research
Figure 2 for ESPnet2-TTS: Extending the Edge of TTS Research
Figure 3 for ESPnet2-TTS: Extending the Edge of TTS Research
Figure 4 for ESPnet2-TTS: Extending the Edge of TTS Research
Viaarxiv icon

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

Add code
Nov 10, 2020
Figure 1 for Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Figure 2 for Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Figure 3 for Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Figure 4 for Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Viaarxiv icon

End-to-End Text-to-Speech using Latent Duration based on VQ-VAE

Add code
Oct 20, 2020
Figure 1 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Figure 2 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Figure 3 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Figure 4 for End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Viaarxiv icon

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

Add code
May 20, 2020
Figure 1 for Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Figure 2 for Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Figure 3 for Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Figure 4 for Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Viaarxiv icon