Picture for Zhehuai Chen

Zhehuai Chen

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

Apr 04, 2024
Viaarxiv icon

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

Add code
Feb 10, 2024
Viaarxiv icon

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings

Jan 08, 2024
Viaarxiv icon

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Add code
Oct 13, 2023
Viaarxiv icon

Using Text Injection to Improve Recognition of Personal Identifiers in Speech

Aug 14, 2023
Figure 1 for Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Figure 2 for Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Figure 3 for Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Figure 4 for Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Viaarxiv icon

Understanding Shared Speech-Text Representations

Apr 27, 2023
Figure 1 for Understanding Shared Speech-Text Representations
Figure 2 for Understanding Shared Speech-Text Representations
Figure 3 for Understanding Shared Speech-Text Representations
Figure 4 for Understanding Shared Speech-Text Representations
Viaarxiv icon

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Add code
Mar 03, 2023
Figure 1 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 2 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 3 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 4 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Viaarxiv icon

Accelerating RNN-T Training and Inference Using CTC guidance

Oct 29, 2022
Figure 1 for Accelerating RNN-T Training and Inference Using CTC guidance
Figure 2 for Accelerating RNN-T Training and Inference Using CTC guidance
Figure 3 for Accelerating RNN-T Training and Inference Using CTC guidance
Figure 4 for Accelerating RNN-T Training and Inference Using CTC guidance
Viaarxiv icon

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

Oct 27, 2022
Figure 1 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Figure 2 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Figure 3 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Figure 4 for Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Viaarxiv icon

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Oct 18, 2022
Figure 1 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 2 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 3 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 4 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Viaarxiv icon