Picture for Qian Chen

Qian Chen

KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model

Add code
Jun 26, 2025
Viaarxiv icon

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

Add code
Jun 26, 2025
Viaarxiv icon

OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment

Add code
Jun 11, 2025
Viaarxiv icon

Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation

Add code
May 30, 2025
Viaarxiv icon

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

Add code
May 23, 2025
Viaarxiv icon

Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization

Add code
May 20, 2025
Viaarxiv icon

Novel Extraction of Discriminative Fine-Grained Feature to Improve Retinal Vessel Segmentation

Add code
May 06, 2025
Viaarxiv icon

DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models

Add code
Apr 22, 2025
Viaarxiv icon

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Add code
Apr 22, 2025
Viaarxiv icon

OmniAudio: Generating Spatial Audio from 360-Degree Video

Add code
Apr 21, 2025
Viaarxiv icon