Alert button
Picture for Shinji Watanabe

Shinji Watanabe

Alert button

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Bookmark button
Alert button
Oct 02, 2023
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

Add code
Bookmark button
Alert button
Oct 01, 2023
Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe

Viaarxiv icon

Toward Universal Speech Enhancement for Diverse Input Conditions

Add code
Bookmark button
Alert button
Sep 29, 2023
Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian

Figure 1 for Toward Universal Speech Enhancement for Diverse Input Conditions
Figure 2 for Toward Universal Speech Enhancement for Diverse Input Conditions
Figure 3 for Toward Universal Speech Enhancement for Diverse Input Conditions
Figure 4 for Toward Universal Speech Enhancement for Diverse Input Conditions
Viaarxiv icon

Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation

Add code
Bookmark button
Alert button
Sep 29, 2023
Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, Shinji Watanabe

Figure 1 for Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Figure 2 for Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Figure 3 for Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Figure 4 for Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Viaarxiv icon

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

Add code
Bookmark button
Alert button
Sep 28, 2023
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe

Figure 1 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 2 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 3 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 4 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Viaarxiv icon

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Add code
Bookmark button
Alert button
Sep 27, 2023
Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe

Figure 1 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Figure 2 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Figure 3 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Figure 4 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Viaarxiv icon

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

Add code
Bookmark button
Alert button
Sep 27, 2023
Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang

Figure 1 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 2 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 3 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Figure 4 for Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Viaarxiv icon

Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

Add code
Bookmark button
Alert button
Sep 27, 2023
Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur

Figure 1 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 2 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 3 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Figure 4 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Viaarxiv icon

Speech collage: code-switched audio generation by collaging monolingual corpora

Add code
Bookmark button
Alert button
Sep 27, 2023
Amir Hussein, Dorsa Zeinali, Ondřej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur

Figure 1 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 2 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 3 for Speech collage: code-switched audio generation by collaging monolingual corpora
Figure 4 for Speech collage: code-switched audio generation by collaging monolingual corpora
Viaarxiv icon

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Add code
Bookmark button
Alert button
Sep 20, 2023
Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondřej Bojar

Figure 1 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 2 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 3 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Figure 4 for Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Viaarxiv icon