Alert button

"speech": models, code, and papers
Alert button

Unsupervised word-level prosody tagging for controllable speech synthesis

Add code
Bookmark button
Alert button
Feb 16, 2022
Yiwei Guo, Chenpeng Du, Kai Yu

Figure 1 for Unsupervised word-level prosody tagging for controllable speech synthesis
Figure 2 for Unsupervised word-level prosody tagging for controllable speech synthesis
Figure 3 for Unsupervised word-level prosody tagging for controllable speech synthesis
Figure 4 for Unsupervised word-level prosody tagging for controllable speech synthesis
Viaarxiv icon

Robust Self-Supervised Audio-Visual Speech Recognition

Add code
Bookmark button
Alert button
Jan 05, 2022
Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed

Figure 1 for Robust Self-Supervised Audio-Visual Speech Recognition
Figure 2 for Robust Self-Supervised Audio-Visual Speech Recognition
Figure 3 for Robust Self-Supervised Audio-Visual Speech Recognition
Figure 4 for Robust Self-Supervised Audio-Visual Speech Recognition
Viaarxiv icon

JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech

Add code
Bookmark button
Alert button
Mar 31, 2022
Dan Lim, Sunghee Jung, Eesung Kim

Figure 1 for JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
Figure 2 for JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
Figure 3 for JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
Viaarxiv icon

Low-latency Monaural Speech Enhancement with Deep Filter-bank Equalizer

Feb 14, 2022
Chengshi Zheng, Wenzhe Liu, Andong Li, Yuxuan Ke, Xiaodong Li

Figure 1 for Low-latency Monaural Speech Enhancement with Deep Filter-bank Equalizer
Figure 2 for Low-latency Monaural Speech Enhancement with Deep Filter-bank Equalizer
Figure 3 for Low-latency Monaural Speech Enhancement with Deep Filter-bank Equalizer
Figure 4 for Low-latency Monaural Speech Enhancement with Deep Filter-bank Equalizer
Viaarxiv icon

On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems

Jun 15, 2022
Kai Li, Yi Luo

Figure 1 for On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems
Figure 2 for On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems
Figure 3 for On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems
Viaarxiv icon

ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

Add code
Bookmark button
Alert button
May 04, 2022
Marcely Zanon Boito, John Ortega, Hugo Riguidel, Antoine Laurent, Loïc Barrault, Fethi Bougares, Firas Chaabani, Ha Nguyen, Florentin Barbier, Souhir Gahbiche, Yannick Estève

Figure 1 for ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks
Figure 2 for ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks
Figure 3 for ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks
Figure 4 for ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks
Viaarxiv icon

A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation

Nov 18, 2021
Tom O'Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker, Nathan Howard

Figure 1 for A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation
Figure 2 for A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation
Figure 3 for A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation
Figure 4 for A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation
Viaarxiv icon

Learning a Dual-Mode Speech Recognition Model via Self-Pruning

Jul 25, 2022
Chunxi Liu, Yuan Shangguan, Haichuan Yang, Yangyang Shi, Raghuraman Krishnamoorthi, Ozlem Kalinli

Figure 1 for Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Figure 2 for Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Figure 3 for Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Figure 4 for Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Viaarxiv icon

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

Add code
Bookmark button
Alert button
Nov 02, 2022
Kun Song, Yongmao Zhang, Yi Lei, Jian Cong, Hanzhao Li, Lei Xie, Gang He, Jinfeng Bai

Figure 1 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 2 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 3 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 4 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Viaarxiv icon

Knowledge-driven Subword Grammar Modeling for Automatic Speech Recognition in Tamil and Kannada

Jul 27, 2022
Madhavaraj A, Bharathi Pilar, Ramakrishnan A G

Figure 1 for Knowledge-driven Subword Grammar Modeling for Automatic Speech Recognition in Tamil and Kannada
Figure 2 for Knowledge-driven Subword Grammar Modeling for Automatic Speech Recognition in Tamil and Kannada
Figure 3 for Knowledge-driven Subword Grammar Modeling for Automatic Speech Recognition in Tamil and Kannada
Figure 4 for Knowledge-driven Subword Grammar Modeling for Automatic Speech Recognition in Tamil and Kannada
Viaarxiv icon