Alert button

"speech recognition": models, code, and papers
Alert button

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Add code
Bookmark button
Alert button
Mar 01, 2023
Max Bain, Jaesung Huh, Tengda Han, Andrew Zisserman

Figure 1 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Figure 2 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Figure 3 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Figure 4 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Viaarxiv icon

Tensor decomposition for minimization of E2E SLU model toward on-device processing

Jun 02, 2023
Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe

Figure 1 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Figure 2 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Figure 3 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Figure 4 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Viaarxiv icon

Masked Audio Text Encoders are Effective Multi-Modal Rescorers

Add code
Bookmark button
Alert button
May 24, 2023
Jinglun Cai, Monica Sunkara, Xilai Li, Anshu Bhatia, Xiao Pan, Sravan Bodapati

Figure 1 for Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Figure 2 for Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Figure 3 for Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Figure 4 for Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Viaarxiv icon

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

May 30, 2023
Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

Figure 1 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 2 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 3 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 4 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Viaarxiv icon

Bridging the Granularity Gap for Acoustic Modeling

Add code
Bookmark button
Alert button
May 27, 2023
Chen Xu, Yuhao Zhang, Chengbo Jiao, Xiaoqian Liu, Chi Hu, Xin Zeng, Tong Xiao, Anxiang Ma, Huizhen Wang, JingBo Zhu

Figure 1 for Bridging the Granularity Gap for Acoustic Modeling
Figure 2 for Bridging the Granularity Gap for Acoustic Modeling
Figure 3 for Bridging the Granularity Gap for Acoustic Modeling
Figure 4 for Bridging the Granularity Gap for Acoustic Modeling
Viaarxiv icon

wav2vec and its current potential to Automatic Speech Recognition in German for the usage in Digital History: A comparative assessment of available ASR-technologies for the use in cultural heritage contexts

Add code
Bookmark button
Alert button
Mar 06, 2023
Michael Fleck, Wolfgang Göderle

Viaarxiv icon

Large-scale unsupervised audio pre-training for video-to-speech synthesis

Add code
Bookmark button
Alert button
Jun 27, 2023
Triantafyllos Kefalas, Yannis Panagakis, Maja Pantic

Figure 1 for Large-scale unsupervised audio pre-training for video-to-speech synthesis
Figure 2 for Large-scale unsupervised audio pre-training for video-to-speech synthesis
Figure 3 for Large-scale unsupervised audio pre-training for video-to-speech synthesis
Figure 4 for Large-scale unsupervised audio pre-training for video-to-speech synthesis
Viaarxiv icon

Self-supervised representations in speech-based depression detection

May 20, 2023
Wen Wu, Chao Zhang, Philip C. Woodland

Figure 1 for Self-supervised representations in speech-based depression detection
Figure 2 for Self-supervised representations in speech-based depression detection
Figure 3 for Self-supervised representations in speech-based depression detection
Figure 4 for Self-supervised representations in speech-based depression detection
Viaarxiv icon

Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings

Add code
Bookmark button
Alert button
Jun 30, 2023
Ilyass Hammouamri, Ismail Khalfaoui-Hassani, Timothée Masquelier

Figure 1 for Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Figure 2 for Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Figure 3 for Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Figure 4 for Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Viaarxiv icon

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

Add code
Bookmark button
Alert button
Nov 29, 2022
Xiaohuan Zhou, Jiaming Wang, Zeyu Cui, Shiliang Zhang, Zhijie Yan, Jingren Zhou, Chang Zhou

Figure 1 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Figure 2 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Figure 3 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Figure 4 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Viaarxiv icon