Alert button

"speech": models, code, and papers
Alert button

KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis

Apr 09, 2024
Adal Abilbekov, Saida Mussakhojayeva, Rustem Yeshpanov, Huseyin Atakan Varol

Viaarxiv icon

Personalized Neural Speech Codec

Mar 31, 2024
Inseon Jang, Haici Yang, Wootaek Lim, Seungkwon Beack, Minje Kim

Viaarxiv icon

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks

Add code
Bookmark button
Alert button
Apr 06, 2024
Yingting Li, Rishabh Bhardwaj, Ambuj Mehrish, Bo Cheng, Soujanya Poria

Viaarxiv icon

VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots

Apr 05, 2024
Akhil Padmanabha, Jessie Yuan, Janavi Gupta, Zulekha Karachiwalla, Carmel Majidi, Henny Admoni, Zackory Erickson

Viaarxiv icon

Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context

Apr 05, 2024
Antoine Caubrière, Elodie Gauthier

Viaarxiv icon

PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders

Apr 03, 2024
Yu Pan, Lei Ma, Jianjun Zhao

Viaarxiv icon

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Add code
Bookmark button
Alert button
Apr 02, 2024
Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu

Viaarxiv icon

Noise Masking Attacks and Defenses for Pretrained Speech Models

Apr 02, 2024
Matthew Jagielski, Om Thakkar, Lun Wang

Viaarxiv icon

Scaling Properties of Speech Language Models

Mar 31, 2024
Santiago Cuervo, Ricard Marxer

Viaarxiv icon

DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition

Add code
Bookmark button
Alert button
Apr 11, 2024
Yi-Cheng Wang, Hsin-Wei Wang, Bi-Cheng Yan, Chi-Han Lin, Berlin Chen

Viaarxiv icon