Alert button

"speech recognition": models, code, and papers
Alert button

Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

Oct 26, 2022
Sharman Tan, Piyush Behre, Nick Kibre, Issac Alphonso, Shuangyu Chang

Figure 1 for Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition
Figure 2 for Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition
Figure 3 for Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition
Figure 4 for Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition
Viaarxiv icon

Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words

Nov 07, 2022
Taesu Kim, SeungHeon Doh, Gyunpyo Lee, Hyungseok Jeon, Juhan Nam, Hyeon-Jeong Suk

Figure 1 for Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words
Figure 2 for Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words
Figure 3 for Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words
Figure 4 for Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words
Viaarxiv icon

Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations

May 14, 2023
Weiwei Lin, Chenhang He, Man-Wai Mak, Youzhi Tu

Figure 1 for Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations
Figure 2 for Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations
Figure 3 for Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations
Figure 4 for Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations
Viaarxiv icon

OLISIA: a Cascade System for Spoken Dialogue State Tracking

Apr 20, 2023
Léo Jacqmin, Lucas Druart, Valentin Vielzeuf, Lina Maria Rojas-Barahona, Yannick Estève, Benoît Favre

Figure 1 for OLISIA: a Cascade System for Spoken Dialogue State Tracking
Figure 2 for OLISIA: a Cascade System for Spoken Dialogue State Tracking
Figure 3 for OLISIA: a Cascade System for Spoken Dialogue State Tracking
Figure 4 for OLISIA: a Cascade System for Spoken Dialogue State Tracking
Viaarxiv icon

Towards Representative Subset Selection for Self-Supervised Speech Recognition

Mar 18, 2022
Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza

Figure 1 for Towards Representative Subset Selection for Self-Supervised Speech Recognition
Figure 2 for Towards Representative Subset Selection for Self-Supervised Speech Recognition
Figure 3 for Towards Representative Subset Selection for Self-Supervised Speech Recognition
Figure 4 for Towards Representative Subset Selection for Self-Supervised Speech Recognition
Viaarxiv icon

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

Feb 10, 2023
Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli

Figure 1 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 2 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 3 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 4 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Viaarxiv icon

VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages

May 21, 2023
Shivam Mhaskar, Vineet Bhat, Akshay Batheja, Sourabh Deoghare, Paramveer Choudhary, Pushpak Bhattacharyya

Figure 1 for VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Figure 2 for VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Figure 3 for VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Figure 4 for VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Viaarxiv icon

Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Jun 27, 2022
Bowen Zhang, Songjun Cao, Xiaoming Zhang, Yike Zhang, Long Ma, Takahiro Shinozaki

Figure 1 for Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Figure 2 for Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Figure 3 for Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Figure 4 for Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Viaarxiv icon

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition

Sep 13, 2022
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno

Figure 1 for Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Figure 2 for Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Figure 3 for Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Figure 4 for Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Viaarxiv icon

Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus

Jun 12, 2023
Théo Deschamps-Berger, Lori Lamel, Laurence Devillers

Figure 1 for Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus
Figure 2 for Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus
Figure 3 for Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus
Figure 4 for Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus
Viaarxiv icon