Alert button

"speech recognition": models, code, and papers
Alert button

Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition

Add code
Bookmark button
Alert button
Jun 22, 2021
Weidong Chen, Xiaofeng Xing, Xiangmin Xu, Jichen Yang, Jianxin Pang

Figure 1 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Figure 2 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Figure 3 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Figure 4 for Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition
Viaarxiv icon

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Apr 25, 2022
Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han

Figure 1 for Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Figure 2 for Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Figure 3 for Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Figure 4 for Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Viaarxiv icon

Universal speaker recognition encoders for different speech segments duration

Add code
Bookmark button
Alert button
Oct 28, 2022
Sergey Novoselov, Vladimir Volokhov, Galina Lavrentyeva

Figure 1 for Universal speaker recognition encoders for different speech segments duration
Figure 2 for Universal speaker recognition encoders for different speech segments duration
Figure 3 for Universal speaker recognition encoders for different speech segments duration
Viaarxiv icon

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Add code
Bookmark button
Alert button
Feb 07, 2022
Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli

Figure 1 for data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Figure 2 for data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Figure 3 for data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Figure 4 for data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Viaarxiv icon

RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

Add code
Bookmark button
Alert button
May 24, 2018
Albert Zeyer, Tamer Alkhouli, Hermann Ney

Figure 1 for RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition
Figure 2 for RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition
Figure 3 for RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition
Figure 4 for RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition
Viaarxiv icon

QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus

Add code
Bookmark button
Alert button
Jun 24, 2021
Hamdy Mubarak, Amir Hussein, Shammur Absar Chowdhury, Ahmed Ali

Figure 1 for QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus
Figure 2 for QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus
Figure 3 for QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus
Figure 4 for QASR: QCRI Aljazeera Speech Resource -- A Large Scale Annotated Arabic Speech Corpus
Viaarxiv icon

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Add code
Bookmark button
Alert button
May 29, 2022
Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

Figure 1 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 2 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 3 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 4 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Viaarxiv icon

Audio-visual Multi-channel Recognition of Overlapped Speech

May 18, 2020
Jianwei Yu, Bo Wu, Rongzhi Gu Shi-Xiong Zhang Lianwu Chen Yong Xu Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng

Figure 1 for Audio-visual Multi-channel Recognition of Overlapped Speech
Figure 2 for Audio-visual Multi-channel Recognition of Overlapped Speech
Figure 3 for Audio-visual Multi-channel Recognition of Overlapped Speech
Figure 4 for Audio-visual Multi-channel Recognition of Overlapped Speech
Viaarxiv icon

Audio Visual Speech Recognition using Deep Recurrent Neural Networks

Nov 09, 2016
Abhinav Thanda, Shankar M Venkatesan

Figure 1 for Audio Visual Speech Recognition using Deep Recurrent Neural Networks
Figure 2 for Audio Visual Speech Recognition using Deep Recurrent Neural Networks
Figure 3 for Audio Visual Speech Recognition using Deep Recurrent Neural Networks
Figure 4 for Audio Visual Speech Recognition using Deep Recurrent Neural Networks
Viaarxiv icon

LegoNN: Building Modular Encoder-Decoder Models

Add code
Bookmark button
Alert button
Jun 07, 2022
Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed

Figure 1 for LegoNN: Building Modular Encoder-Decoder Models
Figure 2 for LegoNN: Building Modular Encoder-Decoder Models
Figure 3 for LegoNN: Building Modular Encoder-Decoder Models
Figure 4 for LegoNN: Building Modular Encoder-Decoder Models
Viaarxiv icon