Alert button

"speech": models, code, and papers
Alert button

LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

Aug 22, 2023
Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang

Viaarxiv icon

Decoupled Structure for Improved Adaptability of End-to-End Models

Aug 25, 2023
Keqi Deng, Philip C. Woodland

Viaarxiv icon

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

May 18, 2023
Zhifu Gao, Zerui Li, Jiaming Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang

Figure 1 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 2 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 3 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 4 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Viaarxiv icon

Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody

Jun 16, 2023
Sofoklis Kakouros, Juraj Šimko, Martti Vainio, Antti Suni

Figure 1 for Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Figure 2 for Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Figure 3 for Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Figure 4 for Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Viaarxiv icon

All Information is Necessary: Integrating Speech Positive and Negative Information by Contrastive Learning for Speech Enhancement

Apr 26, 2023
Xinmeng Xu, Weiping Tu, Chang Han, Yuhong Yang

Figure 1 for All Information is Necessary: Integrating Speech Positive and Negative Information by Contrastive Learning for Speech Enhancement
Figure 2 for All Information is Necessary: Integrating Speech Positive and Negative Information by Contrastive Learning for Speech Enhancement
Figure 3 for All Information is Necessary: Integrating Speech Positive and Negative Information by Contrastive Learning for Speech Enhancement
Figure 4 for All Information is Necessary: Integrating Speech Positive and Negative Information by Contrastive Learning for Speech Enhancement
Viaarxiv icon

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Aug 19, 2023
Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe

Figure 1 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 2 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 3 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 4 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Viaarxiv icon

EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

Jun 01, 2023
Haobin Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Figure 1 for EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Figure 2 for EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Figure 3 for EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Figure 4 for EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Viaarxiv icon

DUB: Discrete Unit Back-translation for Speech Translation

May 19, 2023
Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou

Figure 1 for DUB: Discrete Unit Back-translation for Speech Translation
Figure 2 for DUB: Discrete Unit Back-translation for Speech Translation
Figure 3 for DUB: Discrete Unit Back-translation for Speech Translation
Figure 4 for DUB: Discrete Unit Back-translation for Speech Translation
Viaarxiv icon

Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts

Jul 17, 2023
Rebekka Hubert, Artem Sokolov, Stefan Riezler

Figure 1 for Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts
Figure 2 for Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts
Figure 3 for Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts
Figure 4 for Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts
Viaarxiv icon

Large-Scale Automatic Audiobook Creation

Sep 07, 2023
Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman, Markus Weimer

Figure 1 for Large-Scale Automatic Audiobook Creation
Viaarxiv icon