Picture for Zengrui Jin

Zengrui Jin

Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models

Add code
May 27, 2025
Viaarxiv icon

Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

Add code
May 27, 2025
Viaarxiv icon

Effective and Efficient Mixed Precision Quantization of Speech Foundation Models

Add code
Jan 07, 2025
Figure 1 for Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
Figure 2 for Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
Figure 3 for Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
Figure 4 for Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
Viaarxiv icon

Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition

Add code
Dec 25, 2024
Viaarxiv icon

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

Add code
Nov 26, 2024
Viaarxiv icon

CR-CTC: Consistency regularization on CTC for improved speech recognition

Add code
Oct 07, 2024
Figure 1 for CR-CTC: Consistency regularization on CTC for improved speech recognition
Figure 2 for CR-CTC: Consistency regularization on CTC for improved speech recognition
Figure 3 for CR-CTC: Consistency regularization on CTC for improved speech recognition
Figure 4 for CR-CTC: Consistency regularization on CTC for improved speech recognition
Viaarxiv icon

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

Add code
Sep 01, 2024
Figure 1 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 2 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 3 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 4 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Viaarxiv icon

Advancing Multi-talker ASR Performance with Large Language Models

Add code
Aug 30, 2024
Figure 1 for Advancing Multi-talker ASR Performance with Large Language Models
Figure 2 for Advancing Multi-talker ASR Performance with Large Language Models
Figure 3 for Advancing Multi-talker ASR Performance with Large Language Models
Figure 4 for Advancing Multi-talker ASR Performance with Large Language Models
Viaarxiv icon

Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System

Add code
Jul 13, 2024
Viaarxiv icon

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

Add code
Jul 08, 2024
Figure 1 for Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
Figure 2 for Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
Figure 3 for Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
Figure 4 for Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
Viaarxiv icon