Picture for Haibin Wu

Haibin Wu

Discrete Audio Tokens: More Than a Survey!

Add code
Jun 12, 2025
Viaarxiv icon

Towards Generalized Source Tracing for Codec-Based Deepfake Speech

Add code
Jun 08, 2025
Viaarxiv icon

Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation

Add code
Jun 04, 2025
Viaarxiv icon

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

Add code
Jun 04, 2025
Viaarxiv icon

Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy

Add code
May 19, 2025
Viaarxiv icon

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Add code
Apr 14, 2025
Viaarxiv icon

On The Landscape of Spoken Language Models: A Comprehensive Survey

Add code
Apr 11, 2025
Viaarxiv icon

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Add code
Mar 03, 2025
Viaarxiv icon

CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset

Add code
Jan 14, 2025
Figure 1 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Figure 2 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Figure 3 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Figure 4 for CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset
Viaarxiv icon

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

Add code
Dec 23, 2024
Viaarxiv icon