Picture for Yichen Han

Yichen Han

Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling

Add code
Sep 26, 2025
Viaarxiv icon

Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Add code
Jul 16, 2025
Figure 1 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
Figure 2 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
Figure 3 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
Figure 4 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
Viaarxiv icon

Frame-level emotional state alignment method for speech emotion recognition

Add code
Dec 27, 2023
Viaarxiv icon

CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis

Add code
Dec 16, 2023
Viaarxiv icon

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis

Add code
Oct 07, 2022
Figure 1 for A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
Figure 2 for A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
Figure 3 for A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
Figure 4 for A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
Viaarxiv icon

ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

Add code
Mar 26, 2022
Figure 1 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Figure 2 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Figure 3 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Figure 4 for ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Viaarxiv icon