Alert button

"speech": models, code, and papers
Alert button

BUT CHiME-7 system description

Add code
Bookmark button
Alert button
Oct 18, 2023
Martin Karafiát, Karel Veselý, Igor Szöke, Ladislav Mošner, Karel Beneš, Marcin Witkowski, Germán Barchi, Leonardo Pepino

Viaarxiv icon

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

Aug 28, 2023
Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen

Figure 1 for Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Figure 2 for Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Figure 3 for Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Figure 4 for Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Viaarxiv icon

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Sep 04, 2023
Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng

Figure 1 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 2 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 3 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 4 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Viaarxiv icon

Investigating Speaker Embedding Disentanglement on Natural Read Speech

Aug 08, 2023
Michael Kuhlmann, Adrian Meise, Fritz Seebauer, Petra Wagner, Reinhold Haeb-Umbach

Figure 1 for Investigating Speaker Embedding Disentanglement on Natural Read Speech
Figure 2 for Investigating Speaker Embedding Disentanglement on Natural Read Speech
Figure 3 for Investigating Speaker Embedding Disentanglement on Natural Read Speech
Figure 4 for Investigating Speaker Embedding Disentanglement on Natural Read Speech
Viaarxiv icon

Towards Matching Phones and Speech Representations

Oct 26, 2023
Gene-Ping Yang, Hao Tang

Viaarxiv icon

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

Sep 16, 2023
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Viaarxiv icon

Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model

Aug 22, 2023
Yuezhou Zhang, Amos A Folarin, Judith Dineley, Pauline Conde, Valeria de Angel, Shaoxiong Sun, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Petroula Laiou, Heet Sankesara, Linglong Qian, Faith Matcham, Katie M White, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Björn W. Schuller, Srinivasan Vairavan, Til Wykes, Josep Maria Haro, Brenda WJH Penninx, Vaibhav A Narayan, Matthew Hotopf, Richard JB Dobson, Nicholas Cummins, RADAR-CNS consortium

Figure 1 for Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model
Figure 2 for Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model
Figure 3 for Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model
Figure 4 for Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model
Viaarxiv icon

Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

Aug 28, 2023
Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

Figure 1 for Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Figure 2 for Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Figure 3 for Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Figure 4 for Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Viaarxiv icon

ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing

Add code
Bookmark button
Alert button
Oct 28, 2023
Quoc-Nam Nguyen, Thang Chau Phan, Duc-Vu Nguyen, Kiet Van Nguyen

Viaarxiv icon

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Add code
Bookmark button
Alert button
Nov 14, 2023
Yunfei Chu, Jin Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, Jingren Zhou

Viaarxiv icon