Alert button

"speech recognition": models, code, and papers
Alert button

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

Mar 29, 2023
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

Figure 1 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 2 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 3 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 4 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Viaarxiv icon

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

Nov 02, 2022
Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang

Figure 1 for Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Figure 2 for Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Figure 3 for Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Figure 4 for Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Viaarxiv icon

HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation

Add code
Bookmark button
Alert button
Jun 20, 2023
Cihan Xiao, Henry Li Xinyuan, Jinyi Yang, Dongji Gao, Matthew Wiesner, Kevin Duh, Sanjeev Khudanpur

Figure 1 for HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
Figure 2 for HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
Figure 3 for HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
Figure 4 for HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
Viaarxiv icon

Structured State Space Decoder for Speech Recognition and Synthesis

Add code
Bookmark button
Alert button
Oct 31, 2022
Koichi Miyazaki, Masato Murata, Tomoki Koriyama

Figure 1 for Structured State Space Decoder for Speech Recognition and Synthesis
Figure 2 for Structured State Space Decoder for Speech Recognition and Synthesis
Figure 3 for Structured State Space Decoder for Speech Recognition and Synthesis
Figure 4 for Structured State Space Decoder for Speech Recognition and Synthesis
Viaarxiv icon

Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions

Feb 09, 2023
Nay San, Martijn Bartelds, Blaine Billings, Ella de Falco, Hendi Feriza, Johan Safri, Wawan Sahrozi, Ben Foley, Bradley McDonnell, Dan Jurafsky

Figure 1 for Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions
Viaarxiv icon

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

Add code
Bookmark button
Alert button
Jun 29, 2023
Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi LI, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenhu Chen, Wei Xue, Yike Guo

Figure 1 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Figure 2 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Figure 3 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Figure 4 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Viaarxiv icon

Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech

Aug 10, 2022
Kaitao Song, Teng Wan, Bixia Wang, Huiqiang Jiang, Luna Qiu, Jiahang Xu, Liping Jiang, Qun Lou, Yuqing Yang, Dongsheng Li, Xudong Wang, Lili Qiu

Figure 1 for Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
Figure 2 for Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
Figure 3 for Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
Figure 4 for Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
Viaarxiv icon

TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition

Add code
Bookmark button
Alert button
May 18, 2023
Tiantian Feng, Rajat Hebbar, Shrikanth Narayanan

Figure 1 for TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Figure 2 for TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Figure 3 for TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Figure 4 for TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Viaarxiv icon

Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

Add code
Bookmark button
Alert button
Jun 21, 2023
Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers

Figure 1 for Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection
Figure 2 for Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection
Figure 3 for Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection
Figure 4 for Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection
Viaarxiv icon

Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications

Add code
Bookmark button
Alert button
Jul 14, 2023
Varun Krishna, Tarun Sai, Sriram Ganapathy

Figure 1 for Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
Figure 2 for Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
Figure 3 for Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
Figure 4 for Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
Viaarxiv icon