Picture for Jingbin Hu

Jingbin Hu

FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

Add code
Jun 09, 2026
Viaarxiv icon

MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice Conversion

Add code
Jun 08, 2026
Viaarxiv icon

Beyond Semantic Dominance: Cognitive Affective Reasoning and Empathetic Response Alignment in Audio Language Models

Add code
Jun 05, 2026
Viaarxiv icon

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

Add code
Jun 05, 2026
Viaarxiv icon

Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model

Add code
May 12, 2026
Viaarxiv icon

MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech

Add code
Apr 20, 2026
Viaarxiv icon

FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection

Add code
Apr 07, 2026
Viaarxiv icon

OmniCodec: Low Frame Rate Universal Audio Codec with Semantic-Acoustic Disentanglement

Add code
Mar 21, 2026
Viaarxiv icon

EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs

Add code
Feb 25, 2026
Viaarxiv icon

WenetSpeech-Wu: Datasets, Benchmarks, and Models for a Unified Chinese Wu Dialect Speech Processing Ecosystem

Add code
Jan 16, 2026
Viaarxiv icon