Alert button
Picture for Shinji Watanabe

Shinji Watanabe

Alert button

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Jul 06, 2022
Yifan Peng, Siddharth Dalmia, Ian Lane, Shinji Watanabe

Figure 1 for Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Figure 2 for Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Figure 3 for Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Figure 4 for Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Viaarxiv icon

Improving Speech Enhancement through Fine-Grained Speech Characteristics

Jul 01, 2022
Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

Figure 1 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Figure 2 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Figure 3 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Figure 4 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Viaarxiv icon

Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models

Jul 01, 2022
Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Paola García, Yohei Kawaguchi

Figure 1 for Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models
Figure 2 for Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models
Figure 3 for Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models
Figure 4 for Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models
Viaarxiv icon

Residual Language Model for End-to-end Speech Recognition

Jun 15, 2022
Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Narisetty, Shinji Watanabe

Figure 1 for Residual Language Model for End-to-end Speech Recognition
Figure 2 for Residual Language Model for End-to-end Speech Recognition
Viaarxiv icon

LegoNN: Building Modular Encoder-Decoder Models

Jun 07, 2022
Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed

Figure 1 for LegoNN: Building Modular Encoder-Decoder Models
Figure 2 for LegoNN: Building Modular Encoder-Decoder Models
Figure 3 for LegoNN: Building Modular Encoder-Decoder Models
Figure 4 for LegoNN: Building Modular Encoder-Decoder Models
Viaarxiv icon

Online Neural Diarization of Unlimited Numbers of Speakers

Jun 06, 2022
Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, Yohei Kawaguchi

Figure 1 for Online Neural Diarization of Unlimited Numbers of Speakers
Figure 2 for Online Neural Diarization of Unlimited Numbers of Speakers
Figure 3 for Online Neural Diarization of Unlimited Numbers of Speakers
Figure 4 for Online Neural Diarization of Unlimited Numbers of Speakers
Viaarxiv icon

Self-Supervised Speech Representation Learning: A Review

May 21, 2022
Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

Figure 1 for Self-Supervised Speech Representation Learning: A Review
Figure 2 for Self-Supervised Speech Representation Learning: A Review
Figure 3 for Self-Supervised Speech Representation Learning: A Review
Figure 4 for Self-Supervised Speech Representation Learning: A Review
Viaarxiv icon

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

May 09, 2022
Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Frank Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, Qin Jin

Figure 1 for Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis
Figure 2 for Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis
Figure 3 for Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis
Figure 4 for Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis
Viaarxiv icon

Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

May 02, 2022
Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi

Figure 1 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 2 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 3 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Figure 4 for Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Viaarxiv icon

STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency

Apr 21, 2022
Zhong-Qiu Wang, Gordon Wichern, Shinji Watanabe, Jonathan Le Roux

Figure 1 for STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency
Figure 2 for STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency
Figure 3 for STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency
Figure 4 for STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency
Viaarxiv icon