Picture for Zhixian Zhao

Zhixian Zhao

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Add code
Mar 03, 2025
Viaarxiv icon

Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought

Add code
Feb 25, 2025
Viaarxiv icon

Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment

Add code
Sep 10, 2024
Figure 1 for Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
Figure 2 for Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
Figure 3 for Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
Figure 4 for Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
Viaarxiv icon