Picture for Qian Chen

Qian Chen

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models

Add code
Aug 08, 2025
Viaarxiv icon

Cardiac-CLIP: A Vision-Language Foundation Model for 3D Cardiac CT Images

Add code
Jul 29, 2025
Viaarxiv icon

SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference

Add code
Jul 09, 2025
Viaarxiv icon

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

Add code
Jun 26, 2025
Viaarxiv icon

KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model

Add code
Jun 26, 2025
Viaarxiv icon

OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment

Add code
Jun 11, 2025
Viaarxiv icon

Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation

Add code
May 30, 2025
Viaarxiv icon

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

Add code
May 23, 2025
Viaarxiv icon

Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization

Add code
May 20, 2025
Viaarxiv icon

Novel Extraction of Discriminative Fine-Grained Feature to Improve Retinal Vessel Segmentation

Add code
May 06, 2025
Viaarxiv icon