Picture for Tianzi Wang

Tianzi Wang

MiDashengLM: Efficient Audio Understanding with General Audio Captions

Add code
Aug 06, 2025
Figure 1 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 2 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 3 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 4 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Viaarxiv icon

GLAP: General contrastive audio-text pretraining across domains and languages

Add code
Jun 12, 2025
Figure 1 for GLAP: General contrastive audio-text pretraining across domains and languages
Figure 2 for GLAP: General contrastive audio-text pretraining across domains and languages
Figure 3 for GLAP: General contrastive audio-text pretraining across domains and languages
Figure 4 for GLAP: General contrastive audio-text pretraining across domains and languages
Viaarxiv icon

On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition

Add code
May 28, 2025
Figure 1 for On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
Figure 2 for On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
Figure 3 for On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
Figure 4 for On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
Viaarxiv icon

Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

Add code
May 27, 2025
Viaarxiv icon

Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models

Add code
May 27, 2025
Figure 1 for Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
Figure 2 for Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
Figure 3 for Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
Figure 4 for Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
Viaarxiv icon

Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition

Add code
Dec 25, 2024
Figure 1 for Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition
Figure 2 for Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition
Figure 3 for Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition
Figure 4 for Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition
Viaarxiv icon

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

Add code
Sep 13, 2024
Figure 1 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
Figure 2 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
Figure 3 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
Figure 4 for Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
Viaarxiv icon

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

Add code
Jul 08, 2024
Figure 1 for Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
Figure 2 for Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
Figure 3 for Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
Figure 4 for Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
Viaarxiv icon

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

Add code
Jun 14, 2024
Figure 1 for Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Figure 2 for Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Figure 3 for Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Figure 4 for Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Viaarxiv icon

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

Add code
Jun 14, 2024
Figure 1 for One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Figure 2 for One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Figure 3 for One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Figure 4 for One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Viaarxiv icon