Picture for Xunying Liu

Xunying Liu

Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System

Add code
Jul 13, 2024
Viaarxiv icon

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

Add code
Jul 08, 2024
Viaarxiv icon

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

Add code
Jun 17, 2024
Viaarxiv icon

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

Add code
Jun 14, 2024
Figure 1 for Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Figure 2 for Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Figure 3 for Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Figure 4 for Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Viaarxiv icon

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

Add code
Jun 14, 2024
Figure 1 for Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Figure 2 for Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Figure 3 for Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Figure 4 for Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Viaarxiv icon

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

Add code
Jun 14, 2024
Figure 1 for One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Figure 2 for One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Figure 3 for One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Figure 4 for One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Viaarxiv icon

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

Add code
Jun 14, 2024
Figure 1 for Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Figure 2 for Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Figure 3 for Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Figure 4 for Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Viaarxiv icon

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Add code
Mar 31, 2024
Figure 1 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Figure 2 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Figure 3 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Figure 4 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Viaarxiv icon

Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction

Add code
Jan 31, 2024
Figure 1 for Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
Figure 2 for Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
Figure 3 for Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
Figure 4 for Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
Viaarxiv icon

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

Add code
Jan 08, 2024
Figure 1 for Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Figure 2 for Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Figure 3 for Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Figure 4 for Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Viaarxiv icon