Picture for Zhikang Niu

Zhikang Niu

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Add code
Oct 06, 2025
Viaarxiv icon

Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis

Add code
Sep 26, 2025
Viaarxiv icon

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Add code
Aug 08, 2025
Viaarxiv icon

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

Add code
May 26, 2025
Viaarxiv icon

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling

Add code
May 26, 2025
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Figure 1 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 2 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 3 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 4 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Viaarxiv icon

Towards Flow-Matching-based TTS without Classifier-Free Guidance

Add code
Apr 29, 2025
Viaarxiv icon

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Add code
Apr 22, 2025
Viaarxiv icon

URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models

Add code
Feb 25, 2025
Figure 1 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Figure 2 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Figure 3 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Figure 4 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Viaarxiv icon

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Add code
Dec 20, 2024
Figure 1 for SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
Figure 2 for SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
Figure 3 for SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
Figure 4 for SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
Viaarxiv icon