Alert button
Picture for Brian Yan

Brian Yan

Alert button

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Jan 30, 2024
Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe

Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Oct 02, 2023
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

Sep 28, 2023
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe

Figure 1 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 2 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 3 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 4 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Viaarxiv icon

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Sep 27, 2023
Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe

Figure 1 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Figure 2 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Figure 3 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Figure 4 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Viaarxiv icon

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

Sep 27, 2023
Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang

Figure 1 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 2 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 3 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 4 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Viaarxiv icon

Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

Sep 27, 2023
Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur

Figure 1 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 2 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 3 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 4 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Viaarxiv icon

Speech collage: code-switched audio generation by collaging monolingual corpora

Sep 27, 2023
Amir Hussein, Dorsa Zeinali, Ondřej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur

Figure 1 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 2 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 3 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 4 for Speech collage: code-switched audio generation by collaging monolingual corpora
Viaarxiv icon

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Sep 20, 2023
Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondřej Bojar

Figure 1 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 2 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 3 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 4 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Viaarxiv icon

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Aug 19, 2023
Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe

Figure 1 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 2 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 3 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 4 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Viaarxiv icon

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

Jul 20, 2023
Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe

Figure 1 for Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Figure 2 for Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Figure 3 for Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Viaarxiv icon