Alert button
Picture for Wangyou Zhang

Wangyou Zhang

Alert button

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

Jan 31, 2024
Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, Ruihua Song

Viaarxiv icon

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

Jan 30, 2024
Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe

Viaarxiv icon

Improving Design of Input Condition Invariant Speech Enhancement

Jan 25, 2024
Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, Yanmin Qian

Viaarxiv icon

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

Oct 12, 2023
Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa

Figure 1 for A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction
Figure 2 for A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction
Figure 3 for A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction
Figure 4 for A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction
Viaarxiv icon

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Oct 02, 2023
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Toward Universal Speech Enhancement for Diverse Input Conditions

Sep 29, 2023
Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian

Figure 1 for Toward Universal Speech Enhancement for Diverse Input Conditions
Figure 2 for Toward Universal Speech Enhancement for Diverse Input Conditions
Figure 3 for Toward Universal Speech Enhancement for Diverse Input Conditions
Figure 4 for Toward Universal Speech Enhancement for Diverse Input Conditions
Viaarxiv icon

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

Sep 28, 2023
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe

Figure 1 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 2 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 3 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Figure 4 for Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
Viaarxiv icon

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Jul 23, 2023
Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe

Figure 1 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Figure 2 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Figure 3 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Viaarxiv icon

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

May 25, 2023
Wangyou Zhang, Yanmin Qian

Figure 1 for Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Figure 2 for Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Figure 3 for Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Figure 4 for Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Viaarxiv icon

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

Jul 19, 2022
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe

Figure 1 for ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Figure 2 for ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Figure 3 for ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Figure 4 for ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Viaarxiv icon