Picture for Brian Yan

Brian Yan

Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking

Add code
Sep 27, 2024
Figure 1 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Figure 2 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Figure 3 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Figure 4 for Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Viaarxiv icon

CMU's IWSLT 2024 Simultaneous Speech Translation System

Add code
Aug 14, 2024
Viaarxiv icon

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

Add code
Jun 05, 2024
Viaarxiv icon

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Add code
Jan 30, 2024
Figure 1 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 2 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 3 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 4 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Oct 02, 2023
Viaarxiv icon

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

Add code
Sep 28, 2023
Viaarxiv icon

Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

Add code
Sep 27, 2023
Figure 1 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 2 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 3 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 4 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Viaarxiv icon

Speech collage: code-switched audio generation by collaging monolingual corpora

Add code
Sep 27, 2023
Figure 1 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 2 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 3 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 4 for Speech collage: code-switched audio generation by collaging monolingual corpora
Viaarxiv icon

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

Add code
Sep 27, 2023
Figure 1 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 2 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 3 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 4 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Viaarxiv icon

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Add code
Sep 27, 2023
Viaarxiv icon