Alert button

"speech recognition": models, code, and papers
Alert button

Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition

Jun 05, 2023
Jisung Wang, Haram Lee, Myungwoo Oh

Figure 1 for Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition
Figure 2 for Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition
Figure 3 for Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition
Figure 4 for Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition
Viaarxiv icon

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

Oct 23, 2023
Gautam Krishna, Sameer Dharur, Oggi Rudovic, Pranay Dighe, Saurabh Adya, Ahmed Hussen Abdelaziz, Ahmed H Tewfik

Viaarxiv icon

Distillation Strategies for Discriminative Speech Recognition Rescoring

Jun 15, 2023
Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

Figure 1 for Distillation Strategies for Discriminative Speech Recognition Rescoring
Figure 2 for Distillation Strategies for Discriminative Speech Recognition Rescoring
Figure 3 for Distillation Strategies for Discriminative Speech Recognition Rescoring
Figure 4 for Distillation Strategies for Discriminative Speech Recognition Rescoring
Viaarxiv icon

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

Jul 24, 2023
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Viaarxiv icon

Unified Segment-to-Segment Framework for Simultaneous Sequence Generation

Oct 27, 2023
Shaolei Zhang, Yang Feng

Viaarxiv icon

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus

Add code
Bookmark button
Alert button
Sep 12, 2023
Haoxu Wang, Fan Yu, Xian Shi, Yuezhang Wang, Shiliang Zhang, Ming Li

Figure 1 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 2 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 3 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Figure 4 for SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Viaarxiv icon

The North System for Formosa Speech Recognition Challenge 2023

Oct 06, 2023
Li-Wei Chen, Kai-Chen Cheng, Hung-Shin Lee

Viaarxiv icon

Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition

Sep 22, 2023
Amirali Soltani Tehrani, Niloufar Faridani, Ramin Toosi

Figure 1 for Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition
Figure 2 for Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition
Figure 3 for Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition
Figure 4 for Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition
Viaarxiv icon

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

Sep 19, 2023
Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen

Figure 1 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 2 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 3 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 4 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Viaarxiv icon

CL-MASR: A Continual Learning Benchmark for Multilingual ASR

Add code
Bookmark button
Alert button
Oct 25, 2023
Luca Della Libera, Pooneh Mousavi, Salah Zaiem, Cem Subakan, Mirco Ravanelli

Figure 1 for CL-MASR: A Continual Learning Benchmark for Multilingual ASR
Figure 2 for CL-MASR: A Continual Learning Benchmark for Multilingual ASR
Figure 3 for CL-MASR: A Continual Learning Benchmark for Multilingual ASR
Figure 4 for CL-MASR: A Continual Learning Benchmark for Multilingual ASR
Viaarxiv icon