Picture for Chao Weng

Chao Weng

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Add code
Nov 04, 2022
Viaarxiv icon

Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks

Add code
Oct 14, 2022
Figure 1 for Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks
Figure 2 for Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks
Figure 3 for Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks
Figure 4 for Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks
Viaarxiv icon

The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022

Add code
Oct 11, 2022
Figure 1 for The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022
Figure 2 for The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022
Figure 3 for The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022
Figure 4 for The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022
Viaarxiv icon

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Add code
Jul 20, 2022
Figure 1 for Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Figure 2 for Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Figure 3 for Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Figure 4 for Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Viaarxiv icon

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

Add code
Jul 13, 2022
Figure 1 for Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings
Figure 2 for Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings
Figure 3 for Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings
Figure 4 for Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings
Viaarxiv icon

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

Add code
Jun 05, 2022
Figure 1 for LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Figure 2 for LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Figure 3 for LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Figure 4 for LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Viaarxiv icon

Improving Target Sound Extraction with Timestamp Information

Add code
Apr 02, 2022
Figure 1 for Improving Target Sound Extraction with Timestamp Information
Figure 2 for Improving Target Sound Extraction with Timestamp Information
Figure 3 for Improving Target Sound Extraction with Timestamp Information
Figure 4 for Improving Target Sound Extraction with Timestamp Information
Viaarxiv icon

Integrate Lattice-Free MMI into End-to-End Speech Recognition

Add code
Apr 02, 2022
Figure 1 for Integrate Lattice-Free MMI into End-to-End Speech Recognition
Figure 2 for Integrate Lattice-Free MMI into End-to-End Speech Recognition
Figure 3 for Integrate Lattice-Free MMI into End-to-End Speech Recognition
Figure 4 for Integrate Lattice-Free MMI into End-to-End Speech Recognition
Viaarxiv icon

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

Add code
Feb 04, 2022
Figure 1 for The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge
Figure 2 for The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge
Figure 3 for The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge
Viaarxiv icon

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Add code
Jan 06, 2022
Figure 1 for Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model
Figure 2 for Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model
Figure 3 for Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model
Figure 4 for Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model
Viaarxiv icon