Alert button

"speech": models, code, and papers
Alert button

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

Add code
Bookmark button
Alert button
Jun 20, 2022
Zhifu Gao, Shiliang Zhang, Ian McLoughlin, Zhijie Yan

Figure 1 for Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Figure 2 for Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Figure 3 for Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Figure 4 for Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Viaarxiv icon

Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

Add code
Bookmark button
Alert button
Nov 06, 2020
Ron J. Weiss, RJ Skerry-Ryan, Eric Battenberg, Soroosh Mariooryad, Diederik P. Kingma

Figure 1 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Figure 2 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Figure 3 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Figure 4 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Viaarxiv icon

Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion

Add code
Bookmark button
Alert button
Oct 16, 2020
Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma

Figure 1 for Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Figure 2 for Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Figure 3 for Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Figure 4 for Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Viaarxiv icon

Deep Learning-Based Joint Control of Acoustic Echo Cancellation, Beamforming and Postfiltering

Mar 03, 2022
Thomas Haubner, Walter Kellermann

Figure 1 for Deep Learning-Based Joint Control of Acoustic Echo Cancellation, Beamforming and Postfiltering
Figure 2 for Deep Learning-Based Joint Control of Acoustic Echo Cancellation, Beamforming and Postfiltering
Figure 3 for Deep Learning-Based Joint Control of Acoustic Echo Cancellation, Beamforming and Postfiltering
Figure 4 for Deep Learning-Based Joint Control of Acoustic Echo Cancellation, Beamforming and Postfiltering
Viaarxiv icon

The Interspeech Zero Resource Speech Challenge 2021: Spoken language modelling

Add code
Bookmark button
Alert button
Apr 29, 2021
Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Eugene Kharitonov, Emmanuel Dupoux

Figure 1 for The Interspeech Zero Resource Speech Challenge 2021: Spoken language modelling
Figure 2 for The Interspeech Zero Resource Speech Challenge 2021: Spoken language modelling
Viaarxiv icon

Low-Memory End-to-End Training for Iterative Joint Speech Dereverberation and Separation with A Neural Source Model

Oct 13, 2021
Kohei Saijo, Robin Scheibler

Figure 1 for Low-Memory End-to-End Training for Iterative Joint Speech Dereverberation and Separation with A Neural Source Model
Figure 2 for Low-Memory End-to-End Training for Iterative Joint Speech Dereverberation and Separation with A Neural Source Model
Figure 3 for Low-Memory End-to-End Training for Iterative Joint Speech Dereverberation and Separation with A Neural Source Model
Figure 4 for Low-Memory End-to-End Training for Iterative Joint Speech Dereverberation and Separation with A Neural Source Model
Viaarxiv icon

KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos

Add code
Bookmark button
Alert button
Mar 01, 2019
Egor Lakomkin, Sven Magg, Cornelius Weber, Stefan Wermter

Figure 1 for KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos
Figure 2 for KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos
Figure 3 for KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos
Figure 4 for KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos
Viaarxiv icon

VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer

Add code
Bookmark button
Alert button
Mar 08, 2022
Juan F. Montesinos, Venkatesh S. Kadandale, Gloria Haro

Figure 1 for VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
Figure 2 for VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
Figure 3 for VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
Figure 4 for VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
Viaarxiv icon

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Oct 06, 2021
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong

Figure 1 for Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition
Figure 2 for Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition
Viaarxiv icon

Emotional Speaker Identification using a Novel Capsule Nets Model

Jan 09, 2022
Ali Bou Nassif, Ismail Shahin, Ashraf Elnagar, Divya Velayudhan, Adi Alhudhaif, Kemal Polat

Figure 1 for Emotional Speaker Identification using a Novel Capsule Nets Model
Figure 2 for Emotional Speaker Identification using a Novel Capsule Nets Model
Figure 3 for Emotional Speaker Identification using a Novel Capsule Nets Model
Figure 4 for Emotional Speaker Identification using a Novel Capsule Nets Model
Viaarxiv icon