Alert button
Picture for Shinji Watanabe

Shinji Watanabe

Alert button

E-Branchformer: Branchformer with Enhanced merging for speech recognition

Add code
Bookmark button
Alert button
Sep 30, 2022
Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, Shinji Watanabe

Figure 1 for E-Branchformer: Branchformer with Enhanced merging for speech recognition
Figure 2 for E-Branchformer: Branchformer with Enhanced merging for speech recognition
Figure 3 for E-Branchformer: Branchformer with Enhanced merging for speech recognition
Figure 4 for E-Branchformer: Branchformer with Enhanced merging for speech recognition
Viaarxiv icon

ESPnet-ONNX: Bridging a Gap Between Research and Production

Add code
Bookmark button
Alert button
Sep 20, 2022
Masao Someki, Yosuke Higuchi, Tomoki Hayashi, Shinji Watanabe

Figure 1 for ESPnet-ONNX: Bridging a Gap Between Research and Production
Figure 2 for ESPnet-ONNX: Bridging a Gap Between Research and Production
Figure 3 for ESPnet-ONNX: Bridging a Gap Between Research and Production
Figure 4 for ESPnet-ONNX: Bridging a Gap Between Research and Production
Viaarxiv icon

Deep Speech Synthesis from Articulatory Representations

Add code
Bookmark button
Alert button
Sep 13, 2022
Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli

Figure 1 for Deep Speech Synthesis from Articulatory Representations
Figure 2 for Deep Speech Synthesis from Articulatory Representations
Figure 3 for Deep Speech Synthesis from Articulatory Representations
Figure 4 for Deep Speech Synthesis from Articulatory Representations
Viaarxiv icon

TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation

Add code
Bookmark button
Alert button
Sep 08, 2022
Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

Figure 1 for TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
Figure 2 for TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
Figure 3 for TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
Figure 4 for TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
Viaarxiv icon

ASR2K: Speech Recognition for Around 2000 Languages without Audio

Add code
Bookmark button
Alert button
Sep 06, 2022
Xinjian Li, Florian Metze, David R Mortensen, Alan W Black, Shinji Watanabe

Figure 1 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Figure 2 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Figure 3 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Figure 4 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Viaarxiv icon

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

Add code
Bookmark button
Alert button
Aug 03, 2022
Jiatong Shi, George Saon, David Haws, Shinji Watanabe, Brian Kingsbury

Figure 1 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Figure 2 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Figure 3 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Figure 4 for VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Viaarxiv icon

When Is TTS Augmentation Through a Pivot Language Useful?

Add code
Bookmark button
Alert button
Jul 20, 2022
Nathaniel Robinson, Perez Ogayo, Swetha Gangu, David R. Mortensen, Shinji Watanabe

Figure 1 for When Is TTS Augmentation Through a Pivot Language Useful?
Figure 2 for When Is TTS Augmentation Through a Pivot Language Useful?
Figure 3 for When Is TTS Augmentation Through a Pivot Language Useful?
Figure 4 for When Is TTS Augmentation Through a Pivot Language Useful?
Viaarxiv icon

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

Add code
Bookmark button
Alert button
Jul 19, 2022
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe

Figure 1 for ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Figure 2 for ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Figure 3 for ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Figure 4 for ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Viaarxiv icon

Two-Pass Low Latency End-to-End Spoken Language Understanding

Add code
Bookmark button
Alert button
Jul 14, 2022
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe

Figure 1 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 2 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 3 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Figure 4 for Two-Pass Low Latency End-to-End Spoken Language Understanding
Viaarxiv icon

Improving Speech Enhancement through Fine-Grained Speech Characteristics

Add code
Bookmark button
Alert button
Jul 11, 2022
Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

Figure 1 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Figure 2 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Figure 3 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Figure 4 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Viaarxiv icon