Picture for Sefik Emre Eskimez

Sefik Emre Eskimez

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

Add code
Jul 17, 2024
Viaarxiv icon

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Add code
Jun 26, 2024
Figure 1 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 2 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 3 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 4 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Viaarxiv icon

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

Add code
Jun 09, 2024
Viaarxiv icon

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Add code
Jun 06, 2024
Figure 1 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 2 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 3 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 4 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Viaarxiv icon

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Add code
Feb 12, 2024
Viaarxiv icon

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Add code
Aug 14, 2023
Figure 1 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 2 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 3 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 4 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Viaarxiv icon

Real-Time Audio-Visual End-to-End Speech Enhancement

Add code
Mar 13, 2023
Figure 1 for Real-Time Audio-Visual End-to-End Speech Enhancement
Figure 2 for Real-Time Audio-Visual End-to-End Speech Enhancement
Figure 3 for Real-Time Audio-Visual End-to-End Speech Enhancement
Viaarxiv icon

Speech separation with large-scale self-supervised learning

Add code
Nov 09, 2022
Figure 1 for Speech separation with large-scale self-supervised learning
Figure 2 for Speech separation with large-scale self-supervised learning
Figure 3 for Speech separation with large-scale self-supervised learning
Figure 4 for Speech separation with large-scale self-supervised learning
Viaarxiv icon

Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation

Add code
Nov 05, 2022
Figure 1 for Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation
Figure 2 for Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation
Figure 3 for Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation
Viaarxiv icon

Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation with E3Net

Add code
Nov 04, 2022
Figure 1 for Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation with E3Net
Figure 2 for Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation with E3Net
Figure 3 for Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation with E3Net
Viaarxiv icon