Alert button
Picture for Shinji Watanabe

Shinji Watanabe

Alert button

Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

Add code
Bookmark button
Alert button
Apr 18, 2023
Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

Figure 1 for Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
Figure 2 for Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
Figure 3 for Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
Figure 4 for Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
Viaarxiv icon

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

Add code
Bookmark button
Alert button
Apr 13, 2023
Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg

Figure 1 for Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Figure 2 for Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Figure 3 for Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Figure 4 for Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Viaarxiv icon

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

Add code
Bookmark button
Alert button
Apr 11, 2023
Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polák, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe

Figure 1 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Figure 2 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Figure 3 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Figure 4 for ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Viaarxiv icon

Enhancing Speech-to-Speech Translation with Multiple TTS Targets

Add code
Bookmark button
Alert button
Apr 10, 2023
Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe

Figure 1 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Figure 2 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Figure 3 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Figure 4 for Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Viaarxiv icon

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Add code
Bookmark button
Alert button
Mar 14, 2023
Yifan Peng, Jaesong Lee, Shinji Watanabe

Figure 1 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Figure 2 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Figure 3 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Figure 4 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Viaarxiv icon

End-to-End Speech Recognition: A Survey

Add code
Bookmark button
Alert button
Mar 03, 2023
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe

Figure 1 for End-to-End Speech Recognition: A Survey
Figure 2 for End-to-End Speech Recognition: A Survey
Figure 3 for End-to-End Speech Recognition: A Survey
Figure 4 for End-to-End Speech Recognition: A Survey
Viaarxiv icon

Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding

Add code
Bookmark button
Alert button
Feb 27, 2023
Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe

Figure 1 for Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding
Figure 2 for Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding
Figure 3 for Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding
Figure 4 for Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding
Viaarxiv icon

Improving Massively Multilingual ASR With Auxiliary CTC Objectives

Add code
Bookmark button
Alert button
Feb 27, 2023
William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe

Figure 1 for Improving Massively Multilingual ASR With Auxiliary CTC Objectives
Figure 2 for Improving Massively Multilingual ASR With Auxiliary CTC Objectives
Figure 3 for Improving Massively Multilingual ASR With Auxiliary CTC Objectives
Figure 4 for Improving Massively Multilingual ASR With Auxiliary CTC Objectives
Viaarxiv icon

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

Add code
Bookmark button
Alert button
Feb 16, 2023
Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

Figure 1 for PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement
Figure 2 for PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement
Figure 3 for PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement
Viaarxiv icon

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

Add code
Bookmark button
Alert button
Feb 16, 2023
Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

Figure 1 for TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
Figure 2 for TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
Figure 3 for TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
Figure 4 for TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
Viaarxiv icon