Picture for Yong Qin

Yong Qin

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling

Add code
Jun 14, 2025
Viaarxiv icon

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval

Add code
May 26, 2025
Viaarxiv icon

Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides

Add code
Apr 21, 2025
Viaarxiv icon

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

Add code
Mar 20, 2025
Viaarxiv icon

CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition

Add code
Feb 26, 2025
Viaarxiv icon

MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation

Add code
Jan 18, 2025
Figure 1 for MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation
Figure 2 for MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation
Figure 3 for MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation
Figure 4 for MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation
Viaarxiv icon

SDPO: Segment-Level Direct Preference Optimization for Social Agents

Add code
Jan 03, 2025
Figure 1 for SDPO: Segment-Level Direct Preference Optimization for Social Agents
Figure 2 for SDPO: Segment-Level Direct Preference Optimization for Social Agents
Figure 3 for SDPO: Segment-Level Direct Preference Optimization for Social Agents
Figure 4 for SDPO: Segment-Level Direct Preference Optimization for Social Agents
Viaarxiv icon

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment

Add code
Dec 30, 2024
Viaarxiv icon

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5

Add code
Sep 27, 2024
Figure 1 for ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5
Figure 2 for ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5
Figure 3 for ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5
Figure 4 for ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5
Viaarxiv icon

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper

Add code
Sep 18, 2024
Viaarxiv icon