Alert button

"speech": models, code, and papers
Alert button

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Add code
Bookmark button
Alert button
Sep 22, 2023
Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng

Figure 1 for Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Figure 2 for Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Figure 3 for Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Figure 4 for Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Viaarxiv icon

Deep Neural Networks for Automatic Speaker Recognition Do Not Learn Supra-Segmental Temporal Features

Add code
Bookmark button
Alert button
Nov 02, 2023
Daniel Neururer, Volker Dellwo, Thilo Stadelmann

Viaarxiv icon

VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired

Add code
Bookmark button
Alert button
Aug 27, 2023
Jia-Jyu Su, Pang-Chen Liao, Yen-Ting Lin, Wu-Hao Li, Guan-Ting Liou, Cheng-Che Kao, Wei-Cheng Chen, Jen-Chieh Chiang, Wen-Yang Chang, Pin-Han Lin, Chen-Yu Chiang

Viaarxiv icon

Regularized Conventions: Equilibrium Computation as a Model of Pragmatic Reasoning

Nov 16, 2023
Athul Paul Jacob, Gabriele Farina, Jacob Andreas

Viaarxiv icon

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

Sep 14, 2023
Navin Raj Prabhu, Bunlong Lay, Simon Welker, Nale Lehmann-Willenbrock, Timo Gerkmann

Figure 1 for EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data
Figure 2 for EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data
Figure 3 for EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data
Figure 4 for EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data
Viaarxiv icon

DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis

Sep 22, 2023
Yu Gu, Yianrao Bian, Guangzhi Lei, Chao Weng, Dan Su

Figure 1 for DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis
Figure 2 for DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis
Figure 3 for DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis
Figure 4 for DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis
Viaarxiv icon

Investigating Weight-Perturbed Deep Neural Networks With Application in Iris Presentation Attack Detection

Add code
Bookmark button
Alert button
Nov 22, 2023
Renu Sharma, Redwan Sony, Arun Ross

Viaarxiv icon

Causal Signal-Based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement

Add code
Bookmark button
Alert button
Sep 07, 2023
Julitta Bartolewska, Stanisław Kacprzak, Konrad Kowalczyk

Figure 1 for Causal Signal-Based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement
Figure 2 for Causal Signal-Based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement
Figure 3 for Causal Signal-Based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement
Figure 4 for Causal Signal-Based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement
Viaarxiv icon

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

Add code
Bookmark button
Alert button
Aug 31, 2023
Haohan Guo, Fenglong Xie, Jiawen Kang, Yujia Xiao, Xixin Wu, Helen Meng

Figure 1 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Figure 2 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Figure 3 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Figure 4 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Viaarxiv icon

Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer

Add code
Bookmark button
Alert button
Sep 18, 2023
Peter Ochieng

Viaarxiv icon