Alert button

"speech": models, code, and papers
Alert button

Automatic Spoken Language Identification using a Time-Delay Neural Network

May 19, 2022
Benjamin Kepecs, Homayoon Beigi

Figure 1 for Automatic Spoken Language Identification using a Time-Delay Neural Network
Figure 2 for Automatic Spoken Language Identification using a Time-Delay Neural Network
Figure 3 for Automatic Spoken Language Identification using a Time-Delay Neural Network
Figure 4 for Automatic Spoken Language Identification using a Time-Delay Neural Network
Viaarxiv icon

STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speech

Add code
Bookmark button
Alert button
Mar 17, 2021
Keon Lee, Kyumin Park, Daeyoung Kim

Figure 1 for STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speech
Figure 2 for STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speech
Figure 3 for STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speech
Figure 4 for STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speech
Viaarxiv icon

3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces

Add code
Bookmark button
Alert button
Apr 23, 2021
László Tóth, Amin Honarmandi Shandiz

Figure 1 for 3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces
Figure 2 for 3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces
Figure 3 for 3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces
Figure 4 for 3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces
Viaarxiv icon

Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

Nov 03, 2020
Man-Ling Sung, Siyuan Feng, Tan Lee

Figure 1 for Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features
Figure 2 for Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features
Figure 3 for Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features
Figure 4 for Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features
Viaarxiv icon

Almost Unsupervised Text to Speech and Automatic Speech Recognition

Add code
Bookmark button
Alert button
May 22, 2019
Yi Ren, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Figure 1 for Almost Unsupervised Text to Speech and Automatic Speech Recognition
Figure 2 for Almost Unsupervised Text to Speech and Automatic Speech Recognition
Figure 3 for Almost Unsupervised Text to Speech and Automatic Speech Recognition
Figure 4 for Almost Unsupervised Text to Speech and Automatic Speech Recognition
Viaarxiv icon

ESPnet-ST: All-in-One Speech Translation Toolkit

Add code
Bookmark button
Alert button
Apr 21, 2020
Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Enrique Yalta Soplin, Tomoki Hayashi, Shinji Watanabe

Figure 1 for ESPnet-ST: All-in-One Speech Translation Toolkit
Figure 2 for ESPnet-ST: All-in-One Speech Translation Toolkit
Figure 3 for ESPnet-ST: All-in-One Speech Translation Toolkit
Figure 4 for ESPnet-ST: All-in-One Speech Translation Toolkit
Viaarxiv icon

SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

Add code
Bookmark button
Alert button
Apr 05, 2021
William Chan, Daniel Park, Chris Lee, Yu Zhang, Quoc Le, Mohammad Norouzi

Figure 1 for SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
Figure 2 for SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
Viaarxiv icon

Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis

Add code
Bookmark button
Alert button
Aug 04, 2021
Xudong Dai, Cheng Gong, Longbiao Wang, Kaili Zhang

Figure 1 for Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis
Figure 2 for Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis
Figure 3 for Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis
Figure 4 for Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis
Viaarxiv icon

Amortized Neural Networks for Low-Latency Speech Recognition

Aug 03, 2021
Jonathan Macoskey, Grant P. Strimel, Jinru Su, Ariya Rastrow

Figure 1 for Amortized Neural Networks for Low-Latency Speech Recognition
Figure 2 for Amortized Neural Networks for Low-Latency Speech Recognition
Figure 3 for Amortized Neural Networks for Low-Latency Speech Recognition
Viaarxiv icon

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction

Apr 19, 2021
Stanislav Beliaev, Boris Ginsburg

Figure 1 for TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Figure 2 for TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Figure 3 for TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Figure 4 for TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Viaarxiv icon