Alert button

"speech recognition": models, code, and papers
Alert button

Locality enhanced dynamic biasing and sampling strategies for contextual ASR

Jan 23, 2024
Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung

Viaarxiv icon

Towards Event Extraction from Speech with Contextual Clues

Jan 27, 2024
Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari

Viaarxiv icon

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

Jan 23, 2024
W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath

Viaarxiv icon

Consistency Based Unsupervised Self-training For ASR Personalisation

Jan 22, 2024
Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung

Viaarxiv icon

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Jan 16, 2024
Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

Viaarxiv icon

Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers

Jan 22, 2024
Michael Hentschel, Yuta Nishikawa, Tatsuya Komatsu, Yusuke Fujita

Viaarxiv icon

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

Jan 24, 2024
Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee

Viaarxiv icon

TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion

Jan 25, 2024
Samuel Pegg, Kai Li, Xiaolin Hu

Viaarxiv icon

Multi-Modal Emotion Recognition by Text, Speech and Video Using Pretrained Transformers

Feb 11, 2024
Minoo Shayaninasab, Bagher Babaali

Viaarxiv icon

Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks

Jan 18, 2024
Yichao Du, Zhirui Zhang, Linan Yue, Xu Huang, Yuqing Zhang, Tong Xu, Linli Xu, Enhong Chen

Viaarxiv icon