Alert button

"speech": models, code, and papers
Alert button

Efficient Training of Neural Transducer for Speech Recognition

Apr 22, 2022
Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney

Figure 1 for Efficient Training of Neural Transducer for Speech Recognition
Figure 2 for Efficient Training of Neural Transducer for Speech Recognition
Figure 3 for Efficient Training of Neural Transducer for Speech Recognition
Figure 4 for Efficient Training of Neural Transducer for Speech Recognition
Viaarxiv icon

Speech Enhancement-assisted Stargan Voice Conversion in Noisy Environments

Oct 19, 2021
Yun-Ju Chan, Chiang-Jen Peng, Syu-Siang Wang, Hsin-Min Wang, Yu Tsao, Tai-Shih Chi

Figure 1 for Speech Enhancement-assisted Stargan Voice Conversion in Noisy Environments
Figure 2 for Speech Enhancement-assisted Stargan Voice Conversion in Noisy Environments
Figure 3 for Speech Enhancement-assisted Stargan Voice Conversion in Noisy Environments
Figure 4 for Speech Enhancement-assisted Stargan Voice Conversion in Noisy Environments
Viaarxiv icon

Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training

Dec 21, 2021
Yi Li, Yang Sun, Syed Mohsen Naqvi

Figure 1 for Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training
Figure 2 for Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training
Figure 3 for Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training
Figure 4 for Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training
Viaarxiv icon

Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

Apr 13, 2022
Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw

Figure 1 for Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Figure 2 for Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Figure 3 for Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Figure 4 for Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Viaarxiv icon

MAST: Multiscale Audio Spectrogram Transformers

Add code
Bookmark button
Alert button
Nov 02, 2022
Sreyan Ghosh, Ashish Seth, S. Umesh, Dinesh Manocha

Figure 1 for MAST: Multiscale Audio Spectrogram Transformers
Figure 2 for MAST: Multiscale Audio Spectrogram Transformers
Figure 3 for MAST: Multiscale Audio Spectrogram Transformers
Viaarxiv icon

Lisan: Yemenu, Irqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations

Dec 13, 2022
Mustafa Jarrar, Fadi A Zaraket, Tymaa Hammouda, Daanish Masood Alavi, Martin Waahlisch

Figure 1 for Lisan: Yemenu, Irqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations
Figure 2 for Lisan: Yemenu, Irqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations
Figure 3 for Lisan: Yemenu, Irqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations
Figure 4 for Lisan: Yemenu, Irqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations
Viaarxiv icon

How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

Jan 18, 2022
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

Figure 1 for How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR
Figure 2 for How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR
Figure 3 for How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR
Figure 4 for How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR
Viaarxiv icon

Foundation Transformers

Add code
Bookmark button
Alert button
Oct 19, 2022
Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

Figure 1 for Foundation Transformers
Figure 2 for Foundation Transformers
Figure 3 for Foundation Transformers
Figure 4 for Foundation Transformers
Viaarxiv icon

More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech

Add code
Bookmark button
Alert button
Nov 19, 2021
Michael Hassid, Michelle Tadmor Ramanovich, Brendan Shillingford, Miaosen Wang, Ye Jia, Tal Remez

Figure 1 for More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Figure 2 for More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Figure 3 for More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Figure 4 for More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Viaarxiv icon

Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks

Add code
Bookmark button
Alert button
Dec 27, 2022
Erdong Guo, David Draper, Maria De Iorio

Figure 1 for Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks
Figure 2 for Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks
Figure 3 for Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks
Figure 4 for Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks
Viaarxiv icon