Alert button

"speech": models, code, and papers
Alert button

How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

Add code
Bookmark button
Alert button
Mar 31, 2022
Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Saeed Sarfjoo, Petr Motlicek, Matthias Kleinert, Hartmut Helmke, Oliver Ohneiser, Qingran Zhan

Figure 1 for How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Figure 2 for How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Figure 3 for How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Viaarxiv icon

Knowledge Authoring with Factual English

Add code
Bookmark button
Alert button
Aug 05, 2022
Yuheng Wang, Giorgian Borca-Tasciuc, Nikhil Goel, Paul Fodor, Michael Kifer

Figure 1 for Knowledge Authoring with Factual English
Figure 2 for Knowledge Authoring with Factual English
Figure 3 for Knowledge Authoring with Factual English
Figure 4 for Knowledge Authoring with Factual English
Viaarxiv icon

On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments

Nov 11, 2020
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

Figure 1 for On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments
Figure 2 for On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments
Figure 3 for On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments
Figure 4 for On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments
Viaarxiv icon

Dialog+ in Broadcasting: First Field Tests Using Deep-Learning-Based Dialogue Enhancement

Dec 17, 2021
Matteo Torcoli, Christian Simon, Jouni Paulus, Davide Straninger, Alfred Riedel, Volker Koch, Stefan Wits, Daniela Rieger, Harald Fuchs, Christian Uhle, Stefan Meltzer, Adrian Murtaza

Figure 1 for Dialog+ in Broadcasting: First Field Tests Using Deep-Learning-Based Dialogue Enhancement
Figure 2 for Dialog+ in Broadcasting: First Field Tests Using Deep-Learning-Based Dialogue Enhancement
Figure 3 for Dialog+ in Broadcasting: First Field Tests Using Deep-Learning-Based Dialogue Enhancement
Figure 4 for Dialog+ in Broadcasting: First Field Tests Using Deep-Learning-Based Dialogue Enhancement
Viaarxiv icon

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

Jun 16, 2021
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

Figure 1 for Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition
Figure 2 for Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition
Viaarxiv icon

Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

Mar 31, 2021
Cong-Thanh Do, Rama Doddipatla, Thomas Hain

Figure 1 for Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition
Figure 2 for Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition
Figure 3 for Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition
Figure 4 for Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition
Viaarxiv icon

Detecting Hate Speech in Multi-modal Memes

Dec 29, 2020
Abhishek Das, Japsimar Singh Wahi, Siyao Li

Figure 1 for Detecting Hate Speech in Multi-modal Memes
Figure 2 for Detecting Hate Speech in Multi-modal Memes
Figure 3 for Detecting Hate Speech in Multi-modal Memes
Figure 4 for Detecting Hate Speech in Multi-modal Memes
Viaarxiv icon

Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end

Add code
Bookmark button
Alert button
Jan 24, 2022
Rem Hida, Masaki Hamada, Chie Kamada, Emiru Tsunoo, Toshiyuki Sekiya, Toshiyuki Kumakura

Figure 1 for Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end
Figure 2 for Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end
Figure 3 for Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end
Figure 4 for Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end
Viaarxiv icon

End-to-End Adversarial Text-to-Speech

Add code
Bookmark button
Alert button
Jun 05, 2020
Jeff Donahue, Sander Dieleman, Mikołaj Bińkowski, Erich Elsen, Karen Simonyan

Figure 1 for End-to-End Adversarial Text-to-Speech
Figure 2 for End-to-End Adversarial Text-to-Speech
Figure 3 for End-to-End Adversarial Text-to-Speech
Figure 4 for End-to-End Adversarial Text-to-Speech
Viaarxiv icon

RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks

Jun 15, 2022
Shanghua Gao, Zhong-Yu Li, Qi Han, Ming-Ming Cheng, Liang Wang

Figure 1 for RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Figure 2 for RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Figure 3 for RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Figure 4 for RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Viaarxiv icon