Picture for Jinyu Li

Jinyu Li

Beijing Institute of Technology, China

Self-Supervised Learning for speech recognition with Intermediate layer supervision

Add code
Dec 16, 2021
Figure 1 for Self-Supervised Learning for speech recognition with Intermediate layer supervision
Figure 2 for Self-Supervised Learning for speech recognition with Intermediate layer supervision
Figure 3 for Self-Supervised Learning for speech recognition with Intermediate layer supervision
Figure 4 for Self-Supervised Learning for speech recognition with Intermediate layer supervision
Viaarxiv icon

Sequence-level self-learning with multiple hypotheses

Add code
Dec 10, 2021
Figure 1 for Sequence-level self-learning with multiple hypotheses
Figure 2 for Sequence-level self-learning with multiple hypotheses
Figure 3 for Sequence-level self-learning with multiple hypotheses
Figure 4 for Sequence-level self-learning with multiple hypotheses
Viaarxiv icon

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Add code
Nov 17, 2021
Figure 1 for Separating Long-Form Speech with Group-Wise Permutation Invariant Training
Figure 2 for Separating Long-Form Speech with Group-Wise Permutation Invariant Training
Figure 3 for Separating Long-Form Speech with Group-Wise Permutation Invariant Training
Figure 4 for Separating Long-Form Speech with Group-Wise Permutation Invariant Training
Viaarxiv icon

Recent Advances in End-to-End Automatic Speech Recognition

Add code
Nov 02, 2021
Figure 1 for Recent Advances in End-to-End Automatic Speech Recognition
Figure 2 for Recent Advances in End-to-End Automatic Speech Recognition
Figure 3 for Recent Advances in End-to-End Automatic Speech Recognition
Figure 4 for Recent Advances in End-to-End Automatic Speech Recognition
Viaarxiv icon

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Add code
Oct 29, 2021
Figure 1 for WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Figure 2 for WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Figure 3 for WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Figure 4 for WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Viaarxiv icon

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction

Add code
Oct 28, 2021
Figure 1 for Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Figure 2 for Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Figure 3 for Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Figure 4 for Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Viaarxiv icon

Continuous Speech Separation with Recurrent Selective Attention Network

Add code
Oct 28, 2021
Figure 1 for Continuous Speech Separation with Recurrent Selective Attention Network
Figure 2 for Continuous Speech Separation with Recurrent Selective Attention Network
Figure 3 for Continuous Speech Separation with Recurrent Selective Attention Network
Figure 4 for Continuous Speech Separation with Recurrent Selective Attention Network
Viaarxiv icon

Factorized Neural Transducer for Efficient Language Model Adaptation

Add code
Oct 18, 2021
Figure 1 for Factorized Neural Transducer for Efficient Language Model Adaptation
Figure 2 for Factorized Neural Transducer for Efficient Language Model Adaptation
Figure 3 for Factorized Neural Transducer for Efficient Language Model Adaptation
Figure 4 for Factorized Neural Transducer for Efficient Language Model Adaptation
Viaarxiv icon

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Add code
Oct 14, 2021
Figure 1 for Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition
Figure 2 for Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition
Viaarxiv icon

SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing

Add code
Oct 14, 2021
Figure 1 for SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing
Figure 2 for SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing
Figure 3 for SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing
Figure 4 for SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing
Viaarxiv icon