Alert button

"speech recognition": models, code, and papers
Alert button

Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos

Jul 04, 2023
Ashwin Rao

Figure 1 for Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos
Figure 2 for Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos
Figure 3 for Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos
Figure 4 for Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos
Viaarxiv icon

Improving Continuous Sign Language Recognition with Cross-Lingual Signs

Add code
Bookmark button
Alert button
Aug 21, 2023
Fangyun Wei, Yutong Chen

Figure 1 for Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Figure 2 for Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Figure 3 for Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Figure 4 for Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Viaarxiv icon

Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam

Jan 17, 2023
Kavya Manohar, A. R. Jayan, Rajeev Rajan

Figure 1 for Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam
Figure 2 for Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam
Figure 3 for Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam
Figure 4 for Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam
Viaarxiv icon

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation

Aug 17, 2023
Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An

Figure 1 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Figure 2 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Figure 3 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Figure 4 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Viaarxiv icon

ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging

Aug 05, 2023
Fangyuan Wang, Ming Hao, Yuhai Shi, Bo Xu

Figure 1 for ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Figure 2 for ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Figure 3 for ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Figure 4 for ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Viaarxiv icon

ASR2K: Speech Recognition for Around 2000 Languages without Audio

Add code
Bookmark button
Alert button
Sep 06, 2022
Xinjian Li, Florian Metze, David R Mortensen, Alan W Black, Shinji Watanabe

Figure 1 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Figure 2 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Figure 3 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Figure 4 for ASR2K: Speech Recognition for Around 2000 Languages without Audio
Viaarxiv icon

Continual Learning for On-Device Speech Recognition using Disentangled Conformers

Dec 02, 2022
Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed

Figure 1 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 2 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 3 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 4 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Viaarxiv icon

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Nov 05, 2022
Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li

Figure 1 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 2 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 3 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 4 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Viaarxiv icon

Speaker Diarization of Scripted Audiovisual Content

Add code
Bookmark button
Alert button
Aug 04, 2023
Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico

Viaarxiv icon

Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition

Mar 01, 2023
Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, Martin Radfar, Harish Mallidi, Maurizio Omologo, Athanasios Mouchtaris, Brian King, Roland Maas

Figure 1 for Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition
Figure 2 for Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition
Figure 3 for Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition
Figure 4 for Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition
Viaarxiv icon