Picture for Qiuqiang Kong

Qiuqiang Kong

MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression

Add code
Oct 02, 2025
Viaarxiv icon

PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation

Add code
Oct 01, 2025
Viaarxiv icon

Region-Specific Audio Tagging for Spatial Sound

Add code
Sep 11, 2025
Viaarxiv icon

CLEAR: Continuous Latent Autoregressive Modeling for High-quality and Low-latency Speech Synthesis

Add code
Aug 26, 2025
Viaarxiv icon

Music Source Restoration

Add code
May 27, 2025
Viaarxiv icon

Training-Free Multi-Step Audio Source Separation

Add code
May 26, 2025
Viaarxiv icon

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Add code
Feb 06, 2025
Viaarxiv icon

Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders

Add code
Jan 07, 2025
Figure 1 for Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders
Figure 2 for Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders
Figure 3 for Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders
Figure 4 for Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders
Viaarxiv icon

DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning

Add code
Oct 12, 2024
Figure 1 for DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning
Figure 2 for DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning
Figure 3 for DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning
Figure 4 for DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning
Viaarxiv icon

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement

Add code
Sep 15, 2024
Viaarxiv icon