Alert button

"speech": models, code, and papers
Alert button

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

Jun 27, 2022
Sri Karlapati, Penny Karanasou, Mateusz Lajszczak, Ammar Abbas, Alexis Moinet, Peter Makarov, Ray Li, Arent van Korlaar, Simon Slangen, Thomas Drugman

Figure 1 for CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
Figure 2 for CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
Viaarxiv icon

OCD: Learning to Overfit with Conditional Diffusion Models

Add code
Bookmark button
Alert button
Oct 10, 2022
Shahar Lutati, Lior Wolf

Figure 1 for OCD: Learning to Overfit with Conditional Diffusion Models
Figure 2 for OCD: Learning to Overfit with Conditional Diffusion Models
Figure 3 for OCD: Learning to Overfit with Conditional Diffusion Models
Figure 4 for OCD: Learning to Overfit with Conditional Diffusion Models
Viaarxiv icon

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

Add code
Bookmark button
Alert button
Apr 01, 2021
Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux

Figure 1 for Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Figure 2 for Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Figure 3 for Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Figure 4 for Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Viaarxiv icon

End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive Envelopes

Add code
Bookmark button
Alert button
Aug 09, 2021
Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy

Figure 1 for End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive Envelopes
Figure 2 for End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive Envelopes
Figure 3 for End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive Envelopes
Figure 4 for End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive Envelopes
Viaarxiv icon

A Flow-Based Neural Network for Time Domain Speech Enhancement

Jun 16, 2021
Martin Strauss, Bernd Edler

Figure 1 for A Flow-Based Neural Network for Time Domain Speech Enhancement
Figure 2 for A Flow-Based Neural Network for Time Domain Speech Enhancement
Figure 3 for A Flow-Based Neural Network for Time Domain Speech Enhancement
Figure 4 for A Flow-Based Neural Network for Time Domain Speech Enhancement
Viaarxiv icon

Neural-FST Class Language Model for End-to-End Speech Recognition

Jan 31, 2022
Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer

Figure 1 for Neural-FST Class Language Model for End-to-End Speech Recognition
Figure 2 for Neural-FST Class Language Model for End-to-End Speech Recognition
Figure 3 for Neural-FST Class Language Model for End-to-End Speech Recognition
Viaarxiv icon

PoeticTTS -- Controllable Poetry Reading for Literary Studies

Add code
Bookmark button
Alert button
Jul 11, 2022
Julia Koch, Florian Lux, Nadja Schauffler, Toni Bernhart, Felix Dieterle, Jonas Kuhn, Sandra Richter, Gabriel Viehhauser, Ngoc Thang Vu

Figure 1 for PoeticTTS -- Controllable Poetry Reading for Literary Studies
Figure 2 for PoeticTTS -- Controllable Poetry Reading for Literary Studies
Figure 3 for PoeticTTS -- Controllable Poetry Reading for Literary Studies
Figure 4 for PoeticTTS -- Controllable Poetry Reading for Literary Studies
Viaarxiv icon

On TasNet for Low-Latency Single-Speaker Speech Enhancement

Mar 27, 2021
Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen

Figure 1 for On TasNet for Low-Latency Single-Speaker Speech Enhancement
Figure 2 for On TasNet for Low-Latency Single-Speaker Speech Enhancement
Figure 3 for On TasNet for Low-Latency Single-Speaker Speech Enhancement
Figure 4 for On TasNet for Low-Latency Single-Speaker Speech Enhancement
Viaarxiv icon

Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

Jun 15, 2021
Devang S Ram Mohan, Vivian Hu, Tian Huey Teh, Alexandra Torresquintero, Christopher G. R. Wallis, Marlene Staib, Lorenzo Foglianti, Jiameng Gao, Simon King

Figure 1 for Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
Figure 2 for Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
Figure 3 for Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
Figure 4 for Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
Viaarxiv icon

Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts

Add code
Bookmark button
Alert button
Sep 15, 2022
Vincent Karas, Andreas Triantafyllopoulos, Meishu Song, Björn W. Schuller

Figure 1 for Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts
Figure 2 for Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts
Figure 3 for Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts
Viaarxiv icon