Alert button

"speech": models, code, and papers
Alert button

Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech

Jan 19, 2024
Abhinav Garg, Jiyeon Kim, Sushil Khyalia, Chanwoo Kim, Dhananjaya Gowda

Viaarxiv icon

The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models

Feb 12, 2024
Ayo Adedeji, Sarita Joshi, Brendan Doohan

Viaarxiv icon

Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues

Feb 13, 2024
Maneesh Bilalpur, Mert Inan, Dorsa Zeinali, Jeffrey F. Cohn, Malihe Alikhani

Viaarxiv icon

A Refining Underlying Information Framework for Monaural Speech Enhancement

Add code
Bookmark button
Alert button
Dec 24, 2023
Rui Cao, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

Viaarxiv icon

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

Jan 05, 2024
Dongdi Zhao, Jianbo Ma, Lu Lu, Jinke Li, Xuan Ji, Lei Zhu, Fuming Fang, Ming Liu, Feijun Jiang

Viaarxiv icon

Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks

Dec 10, 2023
Seo-Hyun Lee, Young-Eun Lee, Soowon Kim, Byung-Kwan Ko, Jun-Young Kim, Seong-Whan Lee

Figure 1 for Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
Figure 2 for Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
Figure 3 for Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
Viaarxiv icon

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Jan 13, 2024
A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen

Viaarxiv icon

AugSumm: towards generalizable speech summarization using synthetic labels from large language model

Add code
Bookmark button
Alert button
Jan 10, 2024
Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, Shinji Watanabe

Viaarxiv icon

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

Jan 01, 2024
Huimeng Wang, Zengrui Jin, Mengzhe Geng, Shujie Hu, Guinan Li, Tianzi Wang, Haoning Xu, Xunying Liu

Viaarxiv icon

R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework for Low-Latency Simultaneous Speech Translation

Jan 11, 2024
Jiaxin Guo, Zhanglin Wu, Zongyao Li, Hengchao Shang, Daimeng Wei, Xiaoyu Chen, Zhiqiang Rao, Shaojun Li, Hao Yang

Viaarxiv icon