Picture for Haibin Wu

Haibin Wu

JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions

Add code
May 06, 2026
Viaarxiv icon

Enhancing Conversational TTS with Cascaded Prompting and ICL-Based Online Reinforcement Learning

Add code
Apr 09, 2026
Viaarxiv icon

Joint Fullband-Subband Modeling for High-Resolution SingFake Detection

Add code
Apr 06, 2026
Viaarxiv icon

Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning

Add code
Mar 16, 2026
Viaarxiv icon

T-Mimi: A Transformer-based Mimi Decoder for Real-Time On-Phone TTS

Add code
Jan 27, 2026
Viaarxiv icon

Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks

Add code
Jan 18, 2026
Viaarxiv icon

How Does Instrumental Music Help SingFake Detection?

Add code
Sep 18, 2025
Viaarxiv icon

Discrete Audio Tokens: More Than a Survey!

Add code
Jun 12, 2025
Viaarxiv icon

Towards Generalized Source Tracing for Codec-Based Deepfake Speech

Add code
Jun 08, 2025
Viaarxiv icon

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

Add code
Jun 04, 2025
Figure 1 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 2 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 3 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 4 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Viaarxiv icon