Alert button

"speech recognition": models, code, and papers
Alert button

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

Add code
Bookmark button
Alert button
May 10, 2023
Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, Jing Shi, Shuang Xu, Bo Xu

Figure 1 for X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Figure 2 for X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Figure 3 for X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Figure 4 for X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Viaarxiv icon

DST: Deformable Speech Transformer for Emotion Recognition

Add code
Bookmark button
Alert button
Feb 27, 2023
Weidong Chen, Xiaofen Xing, Xiangmin Xu, Jianxin Pang, Lan Du

Figure 1 for DST: Deformable Speech Transformer for Emotion Recognition
Figure 2 for DST: Deformable Speech Transformer for Emotion Recognition
Figure 3 for DST: Deformable Speech Transformer for Emotion Recognition
Figure 4 for DST: Deformable Speech Transformer for Emotion Recognition
Viaarxiv icon

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Add code
Bookmark button
Alert button
Mar 11, 2023
Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind

Figure 1 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 2 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 3 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 4 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Viaarxiv icon

HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

Apr 13, 2022
Ji Won Yoon, Beom Jun Woo, Nam Soo Kim

Figure 1 for HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition
Figure 2 for HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition
Figure 3 for HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition
Figure 4 for HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition
Viaarxiv icon

End-to-end speech recognition modeling from de-identified data

Jul 12, 2022
Martin Flechl, Shou-Chun Yin, Junho Park, Peter Skala

Figure 1 for End-to-end speech recognition modeling from de-identified data
Figure 2 for End-to-end speech recognition modeling from de-identified data
Figure 3 for End-to-end speech recognition modeling from de-identified data
Figure 4 for End-to-end speech recognition modeling from de-identified data
Viaarxiv icon

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

Mar 17, 2022
Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng

Figure 1 for Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
Figure 2 for Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
Figure 3 for Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
Figure 4 for Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
Viaarxiv icon

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

Apr 25, 2022
Gene-Ping Yang, Hao Tang

Figure 1 for Supervised Attention in Sequence-to-Sequence Models for Speech Recognition
Viaarxiv icon

Investigating the effect of domain selection on automatic speech recognition performance: a case study on Bangladeshi Bangla

Oct 24, 2022
Ahnaf Mozib Samin, M. Humayan Kobir, Md. Mushtaq Shahriyar Rafee, M. Firoz Ahmed, Shafkat Kibria, M. Shahidur Rahman

Figure 1 for Investigating the effect of domain selection on automatic speech recognition performance: a case study on Bangladeshi Bangla
Figure 2 for Investigating the effect of domain selection on automatic speech recognition performance: a case study on Bangladeshi Bangla
Figure 3 for Investigating the effect of domain selection on automatic speech recognition performance: a case study on Bangladeshi Bangla
Figure 4 for Investigating the effect of domain selection on automatic speech recognition performance: a case study on Bangladeshi Bangla
Viaarxiv icon

OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

Add code
Bookmark button
Alert button
Jan 16, 2023
Jeongkyun Park, Jung-Wook Hwang, Kwanghee Choi, Seung-Hyun Lee, Jun Hwan Ahn, Rae-Hong Park, Hyung-Min Park

Figure 1 for OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
Figure 2 for OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
Figure 3 for OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
Figure 4 for OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
Viaarxiv icon

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

Add code
Bookmark button
Alert button
Aug 05, 2023
Pierre Champion

Figure 1 for Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
Figure 2 for Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
Figure 3 for Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
Figure 4 for Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
Viaarxiv icon