Picture for Yabin Li

Yabin Li

MELA-TTS: Joint transformer-diffusion model with representation alignment for speech synthesis

Add code
Sep 18, 2025
Viaarxiv icon

FunAudio-ASR Technical Report

Add code
Sep 15, 2025
Figure 1 for FunAudio-ASR Technical Report
Figure 2 for FunAudio-ASR Technical Report
Figure 3 for FunAudio-ASR Technical Report
Figure 4 for FunAudio-ASR Technical Report
Viaarxiv icon

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

Add code
May 23, 2025
Viaarxiv icon

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Add code
Jan 10, 2025
Figure 1 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 2 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 3 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 4 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Viaarxiv icon

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Add code
May 18, 2023
Figure 1 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 2 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 3 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 4 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Viaarxiv icon