Alert button

"speech recognition": models, code, and papers
Alert button

Investigating Weight-Perturbed Deep Neural Networks With Application in Iris Presentation Attack Detection

Nov 21, 2023
Renu Sharma, Redwan Sony, Arun Ross

Viaarxiv icon

Analysis of Visual Features for Continuous Lipreading in Spanish

Nov 21, 2023
David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

Viaarxiv icon

Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder

Aug 14, 2023
Yusheng Dai, Hang Chen, Jun Du, Xiaofei Ding, Ning Ding, Feijun Jiang, Chin-Hui Lee

Figure 1 for Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Figure 2 for Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Figure 3 for Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Figure 4 for Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Viaarxiv icon

Enhanced Generative Adversarial Networks for Unseen Word Generation from EEG Signals

Nov 14, 2023
Young-Eun Lee, Seo-Hyun Lee, Soowon Kim, Jung-Sun Lee, Deok-Seon Kim, Seong-Whan Lee

Figure 1 for Enhanced Generative Adversarial Networks for Unseen Word Generation from EEG Signals
Figure 2 for Enhanced Generative Adversarial Networks for Unseen Word Generation from EEG Signals
Viaarxiv icon

Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition

Sep 09, 2023
Huaibo Zhao, Yosuke Higuchi, Yusuke Kida, Tetsuji Ogawa, Tetsunori Kobayashi

Figure 1 for Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
Figure 2 for Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
Figure 3 for Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
Figure 4 for Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
Viaarxiv icon

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

Sep 16, 2023
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Viaarxiv icon

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

Aug 15, 2023
Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro

Figure 1 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Figure 2 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Figure 3 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Figure 4 for AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Viaarxiv icon

ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding

Nov 19, 2023
Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou

Viaarxiv icon

Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition

Jul 12, 2023
Titouan Parcollet, Rogier van Dalen, Shucong Zhang, Sourav Bhattacharya

Figure 1 for Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition
Figure 2 for Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition
Figure 3 for Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition
Figure 4 for Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition
Viaarxiv icon

Optimized Tokenization for Transcribed Error Correction

Oct 16, 2023
Tomer Wullach, Shlomo E. Chazan

Figure 1 for Optimized Tokenization for Transcribed Error Correction
Figure 2 for Optimized Tokenization for Transcribed Error Correction
Figure 3 for Optimized Tokenization for Transcribed Error Correction
Figure 4 for Optimized Tokenization for Transcribed Error Correction
Viaarxiv icon