Picture for Zhifu Gao

Zhifu Gao

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Add code
Jul 09, 2024
Viaarxiv icon

MaLa-ASR: Multimedia-Assisted LLM-Based ASR

Add code
Jun 09, 2024
Figure 1 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Figure 2 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Figure 3 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Figure 4 for MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Viaarxiv icon

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

Add code
Feb 13, 2024
Viaarxiv icon

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Add code
Dec 23, 2023
Viaarxiv icon

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

Add code
Oct 11, 2023
Figure 1 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 2 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 3 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 4 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Viaarxiv icon

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

Add code
May 25, 2023
Figure 1 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Figure 2 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Figure 3 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Figure 4 for Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Viaarxiv icon

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Add code
May 18, 2023
Figure 1 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 2 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 3 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 4 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Viaarxiv icon

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

Add code
Jun 20, 2022
Figure 1 for Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Figure 2 for Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Figure 3 for Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Figure 4 for Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Viaarxiv icon

Extremely Low Footprint End-to-End ASR System for Smart Device

Add code
Apr 26, 2021
Figure 1 for Extremely Low Footprint End-to-End ASR System for Smart Device
Figure 2 for Extremely Low Footprint End-to-End ASR System for Smart Device
Figure 3 for Extremely Low Footprint End-to-End ASR System for Smart Device
Figure 4 for Extremely Low Footprint End-to-End ASR System for Smart Device
Viaarxiv icon