Picture for Yuki Saito

Yuki Saito

Geneses: Unified Generative Speech Enhancement and Separation

Add code
Jan 26, 2026
Viaarxiv icon

Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement

Add code
Oct 02, 2025
Figure 1 for Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
Figure 2 for Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
Figure 3 for Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
Figure 4 for Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
Viaarxiv icon

Static Word Embeddings for Sentence Semantic Representation

Add code
Jun 05, 2025
Figure 1 for Static Word Embeddings for Sentence Semantic Representation
Figure 2 for Static Word Embeddings for Sentence Semantic Representation
Figure 3 for Static Word Embeddings for Sentence Semantic Representation
Figure 4 for Static Word Embeddings for Sentence Semantic Representation
Viaarxiv icon

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

Add code
May 18, 2025
Viaarxiv icon

Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features

Add code
Dec 26, 2024
Viaarxiv icon

An Environment-Adaptive Position/Force Control Based on Physical Property Estimation

Add code
Dec 19, 2024
Figure 1 for An Environment-Adaptive Position/Force Control Based on Physical Property Estimation
Figure 2 for An Environment-Adaptive Position/Force Control Based on Physical Property Estimation
Figure 3 for An Environment-Adaptive Position/Force Control Based on Physical Property Estimation
Figure 4 for An Environment-Adaptive Position/Force Control Based on Physical Property Estimation
Viaarxiv icon

An Empirical Analysis of GPT-4V's Performance on Fashion Aesthetic Evaluation

Add code
Oct 31, 2024
Figure 1 for An Empirical Analysis of GPT-4V's Performance on Fashion Aesthetic Evaluation
Figure 2 for An Empirical Analysis of GPT-4V's Performance on Fashion Aesthetic Evaluation
Figure 3 for An Empirical Analysis of GPT-4V's Performance on Fashion Aesthetic Evaluation
Figure 4 for An Empirical Analysis of GPT-4V's Performance on Fashion Aesthetic Evaluation
Viaarxiv icon

Construction and Analysis of Impression Caption Dataset for Environmental Sounds

Add code
Oct 20, 2024
Viaarxiv icon

Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation

Add code
Oct 17, 2024
Figure 1 for Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation
Figure 2 for Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation
Figure 3 for Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation
Figure 4 for Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation
Viaarxiv icon

The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech

Add code
Sep 14, 2024
Figure 1 for The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
Figure 2 for The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
Figure 3 for The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
Figure 4 for The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
Viaarxiv icon