Picture for Qiuqiang Kong

Qiuqiang Kong

One-Step Token-to-Waveform Generation with MeanFlow in Latent Space

Add code
Jun 16, 2026
Viaarxiv icon

Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation

Add code
Jun 10, 2026
Viaarxiv icon

Voice Timbre Attribute Detection with Compact and Interpretable Training-Free Acoustic Parameters

Add code
Mar 05, 2026
Viaarxiv icon

Voices of Civilizations: A Multilingual QA Benchmark for Global Music Understanding

Add code
Feb 28, 2026
Viaarxiv icon

ARCHI-TTS: A flow-matching-based Text-to-Speech Model with Self-supervised Semantic Aligner and Accelerated Inference

Add code
Feb 05, 2026
Viaarxiv icon

SemanticAudio: Audio Generation and Editing in Semantic Space

Add code
Jan 29, 2026
Viaarxiv icon

ImmersiveFlow: Stereo-to-7.1.4 spatial audio generation with flow matching

Add code
Jan 19, 2026
Viaarxiv icon

Summary of The Inaugural Music Source Restoration Challenge

Add code
Jan 07, 2026
Viaarxiv icon

MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression

Add code
Oct 02, 2025
Viaarxiv icon

PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation

Add code
Oct 01, 2025
Viaarxiv icon