Alert button

"speech": models, code, and papers
Alert button

Speaking rate attention-based duration prediction for speed control TTS

Oct 13, 2023
Jesuraj Bandekar, Sathvik Udupa, Abhayjeet Singh, Anjali Jayakumar, Deekshitha G, Sandhya Badiger, Saurabh Kumar, Pooja VH, Prasanta Kumar Ghosh

Viaarxiv icon

GASCOM: Graph-based Attentive Semantic Context Modeling for Online Conversation Understanding

Oct 21, 2023
Vibhor Agarwal, Yu Chen, Nishanth Sastry

Viaarxiv icon

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

Sep 15, 2023
Dariusz Piotrowski, Renard Korzeniowski, Alessio Falai, Sebastian Cygert, Kamil Pokora, Georgi Tinchev, Ziyao Zhang, Kayoko Yanagisawa

Viaarxiv icon

Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition

Sep 05, 2023
Minh Tran, Yufeng Yin, Mohammad Soleymani

Figure 1 for Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition
Figure 2 for Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition
Figure 3 for Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition
Figure 4 for Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition
Viaarxiv icon

A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023

Oct 08, 2023
Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda

Figure 1 for A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Figure 2 for A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Figure 3 for A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Figure 4 for A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Viaarxiv icon

LaughTalk: Expressive 3D Talking Head Generation with Laughter

Nov 02, 2023
Kim Sung-Bin, Lee Hyun, Da Hye Hong, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh

Figure 1 for LaughTalk: Expressive 3D Talking Head Generation with Laughter
Figure 2 for LaughTalk: Expressive 3D Talking Head Generation with Laughter
Figure 3 for LaughTalk: Expressive 3D Talking Head Generation with Laughter
Figure 4 for LaughTalk: Expressive 3D Talking Head Generation with Laughter
Viaarxiv icon

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

Sep 16, 2023
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Viaarxiv icon

Computational analyses of linguistic features with schizophrenic and autistic traits along with formal thought disorders

Oct 14, 2023
Takeshi Saga, Hiroki Tanaka, Satoshi Nakamura

Viaarxiv icon

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

Oct 14, 2023
Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

Figure 1 for SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Figure 2 for SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Figure 3 for SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Figure 4 for SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Viaarxiv icon

SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

Oct 08, 2023
Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie

Viaarxiv icon