Alert button

"speech recognition": models, code, and papers
Alert button

OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking

Add code
Bookmark button
Alert button
May 15, 2023
Fazle Rabbi Rakib, Souhardya Saha Dip, Samiul Alam, Nazia Tasnim, Md. Istiak Hossain Shihab, Md. Nazmuddoha Ansary, Syed Mobassir Hossen, Marsia Haque Meghla, Mamunur Mamun, Farig Sadeque, Sayma Sultana Chowdhury, Tahsin Reasat, Asif Sushmit, Ahmed Imtiaz Humayun

Figure 1 for OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking
Figure 2 for OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking
Figure 3 for OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking
Figure 4 for OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking
Viaarxiv icon

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

Sep 15, 2023
Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li

Figure 1 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 2 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 3 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Figure 4 for t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Viaarxiv icon

Factual Consistency Oriented Speech Recognition

Feb 24, 2023
Naoyuki Kanda, Takuya Yoshioka, Yang Liu

Figure 1 for Factual Consistency Oriented Speech Recognition
Figure 2 for Factual Consistency Oriented Speech Recognition
Figure 3 for Factual Consistency Oriented Speech Recognition
Figure 4 for Factual Consistency Oriented Speech Recognition
Viaarxiv icon

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition

Add code
Bookmark button
Alert button
Jun 09, 2023
Xianzhao Chen, Yist Y. Lin, Kang Wang, Yi He, Zejun Ma

Figure 1 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Figure 2 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Figure 3 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Figure 4 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Viaarxiv icon

Contextual Biasing of Named-Entities with Large Language Models

Sep 22, 2023
Chuanneng Sun, Zeeshan Ahmed, Yingyi Ma, Zhe Liu, Lucas Kabela, Yutong Pang, Ozlem Kalinli

Figure 1 for Contextual Biasing of Named-Entities with Large Language Models
Figure 2 for Contextual Biasing of Named-Entities with Large Language Models
Figure 3 for Contextual Biasing of Named-Entities with Large Language Models
Figure 4 for Contextual Biasing of Named-Entities with Large Language Models
Viaarxiv icon

Joint Audio and Speech Understanding

Oct 02, 2023
Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass

Viaarxiv icon

Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations

Aug 28, 2023
Théo Deschamps-Berger, Lori Lamel, Laurence Devillers

Figure 1 for Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations
Figure 2 for Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations
Figure 3 for Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations
Figure 4 for Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations
Viaarxiv icon

PromptASR for contextualized ASR with controllable style

Add code
Bookmark button
Alert button
Sep 14, 2023
Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey

Figure 1 for PromptASR for contextualized ASR with controllable style
Figure 2 for PromptASR for contextualized ASR with controllable style
Figure 3 for PromptASR for contextualized ASR with controllable style
Figure 4 for PromptASR for contextualized ASR with controllable style
Viaarxiv icon

Speech enhancement with frequency domain auto-regressive modeling

Add code
Bookmark button
Alert button
Sep 24, 2023
Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram Ganapathy

Viaarxiv icon

End-to-End Speech Recognition and Disfluency Removal with Acoustic Language Model Pretraining

Add code
Bookmark button
Alert button
Sep 08, 2023
Saksham Bassi, Giulio Duregon, Siddhartha Jalagam, David Roth

Figure 1 for End-to-End Speech Recognition and Disfluency Removal with Acoustic Language Model Pretraining
Viaarxiv icon