Picture for Brian Yan

Brian Yan

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

Add code
Jun 05, 2024
Viaarxiv icon

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Add code
Jan 30, 2024
Figure 1 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 2 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 3 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Figure 4 for OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Oct 02, 2023
Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

Add code
Sep 28, 2023
Figure 1 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 2 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 3 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 4 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Viaarxiv icon

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Add code
Sep 27, 2023
Figure 1 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Figure 2 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Figure 3 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Figure 4 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Viaarxiv icon

Speech collage: code-switched audio generation by collaging monolingual corpora

Add code
Sep 27, 2023
Figure 1 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 2 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 3 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 4 for Speech collage: code-switched audio generation by collaging monolingual corpora
Viaarxiv icon

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

Add code
Sep 27, 2023
Figure 1 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 2 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 3 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 4 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Viaarxiv icon

Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

Add code
Sep 27, 2023
Figure 1 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 2 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 3 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 4 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Viaarxiv icon

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Add code
Sep 20, 2023
Figure 1 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 2 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 3 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 4 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Viaarxiv icon

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Add code
Aug 19, 2023
Figure 1 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 2 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 3 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 4 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Viaarxiv icon