Picture for Zhikang Niu

Zhikang Niu

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

Add code
Jan 20, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis

Add code
Dec 21, 2025
Viaarxiv icon

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Add code
Oct 06, 2025
Figure 1 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Figure 2 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Figure 3 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Figure 4 for UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Viaarxiv icon

Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis

Add code
Sep 26, 2025
Viaarxiv icon

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Add code
Aug 08, 2025
Viaarxiv icon

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling

Add code
May 26, 2025
Figure 1 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Figure 2 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Figure 3 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Figure 4 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Viaarxiv icon

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

Add code
May 26, 2025
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Figure 1 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 2 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 3 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 4 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Viaarxiv icon

Towards Flow-Matching-based TTS without Classifier-Free Guidance

Add code
Apr 29, 2025
Viaarxiv icon