Picture for Zhihao Du

Zhihao Du

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Add code
Jul 09, 2024
Viaarxiv icon

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

Add code
Feb 13, 2024
Viaarxiv icon

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

Add code
Oct 11, 2023
Figure 1 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 2 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 3 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 4 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Viaarxiv icon

SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR

Add code
Oct 07, 2023
Figure 1 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Figure 2 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Figure 3 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Figure 4 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Viaarxiv icon

The second multi-channel multi-party meeting transcription challenge 2.0): A benchmark for speaker-attributed ASR

Add code
Sep 24, 2023
Figure 1 for The second multi-channel multi-party meeting transcription challenge  2.0): A benchmark for speaker-attributed ASR
Figure 2 for The second multi-channel multi-party meeting transcription challenge  2.0): A benchmark for speaker-attributed ASR
Figure 3 for The second multi-channel multi-party meeting transcription challenge  2.0): A benchmark for speaker-attributed ASR
Figure 4 for The second multi-channel multi-party meeting transcription challenge  2.0): A benchmark for speaker-attributed ASR
Viaarxiv icon

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

Add code
Sep 14, 2023
Figure 1 for FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Figure 2 for FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Figure 3 for FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Figure 4 for FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Viaarxiv icon

CASA-ASR: Context-Aware Speaker-Attributed ASR

Add code
May 21, 2023
Figure 1 for CASA-ASR: Context-Aware Speaker-Attributed ASR
Figure 2 for CASA-ASR: Context-Aware Speaker-Attributed ASR
Figure 3 for CASA-ASR: Context-Aware Speaker-Attributed ASR
Figure 4 for CASA-ASR: Context-Aware Speaker-Attributed ASR
Viaarxiv icon

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Add code
May 18, 2023
Figure 1 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 2 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 3 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 4 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Viaarxiv icon

TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization

Add code
Mar 08, 2023
Figure 1 for TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization
Figure 2 for TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization
Figure 3 for TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization
Figure 4 for TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization
Viaarxiv icon

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis

Add code
Nov 18, 2022
Figure 1 for Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis
Figure 2 for Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis
Figure 3 for Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis
Figure 4 for Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis
Viaarxiv icon