Picture for Wenjie Tian

Wenjie Tian

YingMusic-Singer: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance

Add code
Mar 25, 2026
Viaarxiv icon

EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs

Add code
Feb 25, 2026
Viaarxiv icon

Integrating Fine-Grained Audio-Visual Evidence for Robust Multimodal Emotion Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

VoiceSculptor: Your Voice, Designed By You

Add code
Jan 15, 2026
Viaarxiv icon

PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation

Add code
Oct 01, 2025
Viaarxiv icon

Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis

Add code
Aug 08, 2025
Viaarxiv icon

Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought

Add code
Feb 25, 2025
Figure 1 for Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
Figure 2 for Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
Figure 3 for Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
Figure 4 for Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
Viaarxiv icon

CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions

Add code
Jan 28, 2025
Viaarxiv icon

Autoregressive Speech Synthesis with Next-Distribution Prediction

Add code
Dec 22, 2024
Figure 1 for Autoregressive Speech Synthesis with Next-Distribution Prediction
Figure 2 for Autoregressive Speech Synthesis with Next-Distribution Prediction
Figure 3 for Autoregressive Speech Synthesis with Next-Distribution Prediction
Figure 4 for Autoregressive Speech Synthesis with Next-Distribution Prediction
Viaarxiv icon