Picture for Xie Chen

Xie Chen

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception

Add code
Oct 14, 2025
Viaarxiv icon

Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models

Add code
Oct 10, 2025
Viaarxiv icon

Towards Responsible Evaluation for Text-to-Speech

Add code
Oct 08, 2025
Viaarxiv icon

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Add code
Oct 06, 2025
Viaarxiv icon

AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook

Add code
Sep 26, 2025
Viaarxiv icon

Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis

Add code
Sep 26, 2025
Viaarxiv icon

Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video

Add code
Sep 10, 2025
Viaarxiv icon

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Add code
Aug 08, 2025
Viaarxiv icon

FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

Add code
Jul 22, 2025
Viaarxiv icon

NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025

Add code
Jun 16, 2025
Viaarxiv icon