Alert button

"speech recognition": models, code, and papers
Alert button

Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation Restoration

Feb 05, 2024
Xiliang Zhu, Chia-Tien Chang, Shayna Gardiner, David Rossouw, Jonas Robertson

Viaarxiv icon

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

Dec 22, 2023
Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

Figure 1 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 2 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 3 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 4 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Viaarxiv icon

Progressive unsupervised domain adaptation for ASR using ensemble models and multi-stage training

Feb 07, 2024
Rehan Ahmad, Muhammad Umar Farooq, Thomas Hain

Viaarxiv icon

VNLP: Turkish NLP Package

Mar 02, 2024
Meliksah Turker, Mehmet Erdi Ari, Aydin Han

Figure 1 for VNLP: Turkish NLP Package
Figure 2 for VNLP: Turkish NLP Package
Figure 3 for VNLP: Turkish NLP Package
Figure 4 for VNLP: Turkish NLP Package
Viaarxiv icon

DeepCover: Advancing RNN Test Coverage and Online Error Prediction using State Machine Extraction

Feb 10, 2024
Pouria Golshanrad, Fathiyeh Faghih

Viaarxiv icon

Stateful FastConformer with Cache-based Inference for Streaming Automatic Speech Recognition

Dec 27, 2023
Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

Viaarxiv icon

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Feb 06, 2024
Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun

Viaarxiv icon

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

Feb 12, 2024
Qian Yang, Jin Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, Yuanjun Lv, Zhou Zhao, Chang Zhou, Jingren Zhou

Viaarxiv icon

Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition

Dec 14, 2023
Fan Yu, Haoxu Wang, Ziyang Ma, Shiliang Zhang

Viaarxiv icon

Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages

Feb 27, 2024
Lucía Gómez Zaragozá, Rocío del Amor, Elena Parra Vargas, Valery Naranjo, Mariano Alcañiz Raya, Javier Marín-Morales

Viaarxiv icon