Picture for Pengyuan Zhang

Pengyuan Zhang

Decoupled Federated Learning for ASR with Non-IID Data

Add code
Jun 18, 2022
Figure 1 for Decoupled Federated Learning for ASR with Non-IID Data
Figure 2 for Decoupled Federated Learning for ASR with Non-IID Data
Figure 3 for Decoupled Federated Learning for ASR with Non-IID Data
Figure 4 for Decoupled Federated Learning for ASR with Non-IID Data
Viaarxiv icon

Streaming non-autoregressive model for any-to-many voice conversion

Add code
Jun 15, 2022
Figure 1 for Streaming non-autoregressive model for any-to-many voice conversion
Figure 2 for Streaming non-autoregressive model for any-to-many voice conversion
Figure 3 for Streaming non-autoregressive model for any-to-many voice conversion
Figure 4 for Streaming non-autoregressive model for any-to-many voice conversion
Viaarxiv icon

Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy

Add code
Apr 25, 2022
Figure 1 for Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy
Figure 2 for Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy
Figure 3 for Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy
Figure 4 for Audio-Visual Scene Classification Using A Transfer Learning Based Joint Optimization Strategy
Viaarxiv icon

Back-ends Selection for Deep Speaker Embeddings

Add code
Apr 25, 2022
Figure 1 for Back-ends Selection for Deep Speaker Embeddings
Figure 2 for Back-ends Selection for Deep Speaker Embeddings
Figure 3 for Back-ends Selection for Deep Speaker Embeddings
Figure 4 for Back-ends Selection for Deep Speaker Embeddings
Viaarxiv icon

CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition

Add code
Mar 31, 2022
Figure 1 for CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Figure 2 for CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Figure 3 for CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Figure 4 for CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Viaarxiv icon

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational Speech Dataset

Add code
Mar 31, 2022
Figure 1 for Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational Speech Dataset
Figure 2 for Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational Speech Dataset
Figure 3 for Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational Speech Dataset
Figure 4 for Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational Speech Dataset
Viaarxiv icon

Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

Add code
Feb 22, 2022
Figure 1 for Improving CTC-based speech recognition via knowledge transferring from pre-trained language models
Figure 2 for Improving CTC-based speech recognition via knowledge transferring from pre-trained language models
Figure 3 for Improving CTC-based speech recognition via knowledge transferring from pre-trained language models
Figure 4 for Improving CTC-based speech recognition via knowledge transferring from pre-trained language models
Viaarxiv icon

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge

Add code
Jan 29, 2022
Viaarxiv icon

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

Add code
Jan 26, 2022
Figure 1 for Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models
Figure 2 for Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models
Figure 3 for Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models
Figure 4 for Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models
Viaarxiv icon

Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition

Add code
Dec 23, 2021
Figure 1 for Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition
Figure 2 for Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition
Figure 3 for Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition
Figure 4 for Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition
Viaarxiv icon