Picture for Yan Deng

Yan Deng

Fine-Tuning Large Multimodal Models for Automatic Pronunciation Assessment

Add code
Sep 19, 2025
Viaarxiv icon

Exploring the Potential of Large Multimodal Models as Effective Alternatives for Pronunciation Assessment

Add code
Mar 14, 2025
Viaarxiv icon

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Add code
Mar 13, 2025
Viaarxiv icon

Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models

Add code
Jun 08, 2023
Figure 1 for Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
Figure 2 for Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
Figure 3 for Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
Figure 4 for Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models
Viaarxiv icon

Speech BERT Embedding For Improving Prosody in Neural TTS

Add code
Jun 15, 2021
Figure 1 for Speech BERT Embedding For Improving Prosody in Neural TTS
Figure 2 for Speech BERT Embedding For Improving Prosody in Neural TTS
Figure 3 for Speech BERT Embedding For Improving Prosody in Neural TTS
Figure 4 for Speech BERT Embedding For Improving Prosody in Neural TTS
Viaarxiv icon

Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation

Add code
Apr 08, 2021
Figure 1 for Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation
Figure 2 for Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation
Figure 3 for Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation
Figure 4 for Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation
Viaarxiv icon

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

Add code
Jun 03, 2019
Figure 1 for Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS
Figure 2 for Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS
Figure 3 for Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS
Figure 4 for Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS
Viaarxiv icon

Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice

Add code
Dec 18, 2018
Figure 1 for Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice
Figure 2 for Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice
Figure 3 for Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice
Figure 4 for Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice
Viaarxiv icon