Alert button
Picture for Shinji Watanabe

Shinji Watanabe

Alert button

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

Dec 21, 2022
Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe

Figure 1 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Figure 2 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Figure 3 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Viaarxiv icon

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

Dec 20, 2022
Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

Figure 1 for SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
Figure 2 for SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
Figure 3 for SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
Figure 4 for SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
Viaarxiv icon

Context-aware Fine-tuning of Self-supervised Speech Models

Dec 16, 2022
Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe

Figure 1 for Context-aware Fine-tuning of Self-supervised Speech Models
Figure 2 for Context-aware Fine-tuning of Self-supervised Speech Models
Figure 3 for Context-aware Fine-tuning of Self-supervised Speech Models
Figure 4 for Context-aware Fine-tuning of Self-supervised Speech Models
Viaarxiv icon

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Dec 15, 2022
Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino

Figure 1 for UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Figure 2 for UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Figure 3 for UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Figure 4 for UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Viaarxiv icon

SpeechLMScore: Evaluating speech generation using speech language model

Dec 08, 2022
Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe

Figure 1 for SpeechLMScore: Evaluating speech generation using speech language model
Figure 2 for SpeechLMScore: Evaluating speech generation using speech language model
Figure 3 for SpeechLMScore: Evaluating speech generation using speech language model
Figure 4 for SpeechLMScore: Evaluating speech generation using speech language model
Viaarxiv icon

EURO: ESPnet Unsupervised ASR Open-source Toolkit

Dec 01, 2022
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur

Figure 1 for EURO: ESPnet Unsupervised ASR Open-source Toolkit
Figure 2 for EURO: ESPnet Unsupervised ASR Open-source Toolkit
Figure 3 for EURO: ESPnet Unsupervised ASR Open-source Toolkit
Figure 4 for EURO: ESPnet Unsupervised ASR Open-source Toolkit
Viaarxiv icon

TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

Nov 22, 2022
Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

Figure 1 for TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
Figure 2 for TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
Figure 3 for TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
Figure 4 for TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
Viaarxiv icon

Streaming Joint Speech Recognition and Disfluency Detection

Nov 16, 2022
Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe

Figure 1 for Streaming Joint Speech Recognition and Disfluency Detection
Figure 2 for Streaming Joint Speech Recognition and Disfluency Detection
Figure 3 for Streaming Joint Speech Recognition and Disfluency Detection
Figure 4 for Streaming Joint Speech Recognition and Disfluency Detection
Viaarxiv icon

A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units

Nov 12, 2022
Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky

Figure 1 for A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units
Figure 2 for A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units
Figure 3 for A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units
Figure 4 for A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units
Viaarxiv icon

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

Nov 11, 2022
Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe

Figure 1 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Figure 2 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Figure 3 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Figure 4 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Viaarxiv icon