Alert button

"speech recognition": models, code, and papers
Alert button

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

May 24, 2022
Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

Figure 1 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 2 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 3 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 4 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Viaarxiv icon

Cloud-Based Face and Speech Recognition for Access Control Applications

May 08, 2020
Nathalie Tkauc, Thao Tran, Kevin Hernandez-Diaz, Fernando Alonso-Fernandez

Figure 1 for Cloud-Based Face and Speech Recognition for Access Control Applications
Figure 2 for Cloud-Based Face and Speech Recognition for Access Control Applications
Figure 3 for Cloud-Based Face and Speech Recognition for Access Control Applications
Figure 4 for Cloud-Based Face and Speech Recognition for Access Control Applications
Viaarxiv icon

Multilingual Speech Recognition With A Single End-To-End Model

Feb 15, 2018
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao

Figure 1 for Multilingual Speech Recognition With A Single End-To-End Model
Figure 2 for Multilingual Speech Recognition With A Single End-To-End Model
Figure 3 for Multilingual Speech Recognition With A Single End-To-End Model
Figure 4 for Multilingual Speech Recognition With A Single End-To-End Model
Viaarxiv icon

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

May 09, 2019
Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney

Figure 1 for Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Figure 2 for Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Figure 3 for Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Figure 4 for Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Viaarxiv icon

Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition

May 11, 2015
Xiangang Li, Xihong Wu

Figure 1 for Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition
Figure 2 for Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition
Figure 3 for Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition
Figure 4 for Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition
Viaarxiv icon

Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models

Jul 01, 2019
Ke Hu, Antoine Bruguier, Tara N. Sainath, Rohit Prabhavalkar, Golan Pundak

Figure 1 for Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
Figure 2 for Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
Figure 3 for Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
Figure 4 for Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
Viaarxiv icon

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Feb 07, 2022
Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli

Figure 1 for data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Figure 2 for data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Figure 3 for data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Figure 4 for data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Viaarxiv icon

Quantitative phase and absorption contrast imaging

Mar 23, 2022
Miguel Moscoso, Alexei Novikov, George Papanicolaou, Chrysoula Tsogka

Figure 1 for Quantitative phase and absorption contrast imaging
Figure 2 for Quantitative phase and absorption contrast imaging
Figure 3 for Quantitative phase and absorption contrast imaging
Figure 4 for Quantitative phase and absorption contrast imaging
Viaarxiv icon

End-to-End Neural Segmental Models for Speech Recognition

Aug 15, 2017
Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals

Figure 1 for End-to-End Neural Segmental Models for Speech Recognition
Figure 2 for End-to-End Neural Segmental Models for Speech Recognition
Figure 3 for End-to-End Neural Segmental Models for Speech Recognition
Figure 4 for End-to-End Neural Segmental Models for Speech Recognition
Viaarxiv icon

Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition

Oct 24, 2019
Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang

Figure 1 for Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
Figure 2 for Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
Figure 3 for Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
Figure 4 for Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
Viaarxiv icon