Alert button

"speech": models, code, and papers
Alert button

On-the-fly Text Retrieval for End-to-End ASR Adaptation

Mar 20, 2023
Bolaji Yusuf, Aditya Gourav, Ankur Gandhe, Ivan Bulyko

Figure 1 for On-the-fly Text Retrieval for End-to-End ASR Adaptation
Figure 2 for On-the-fly Text Retrieval for End-to-End ASR Adaptation
Figure 3 for On-the-fly Text Retrieval for End-to-End ASR Adaptation
Figure 4 for On-the-fly Text Retrieval for End-to-End ASR Adaptation
Viaarxiv icon

Exploring Representation Learning for Small-Footprint Keyword Spotting

Mar 20, 2023
Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang

Figure 1 for Exploring Representation Learning for Small-Footprint Keyword Spotting
Figure 2 for Exploring Representation Learning for Small-Footprint Keyword Spotting
Figure 3 for Exploring Representation Learning for Small-Footprint Keyword Spotting
Figure 4 for Exploring Representation Learning for Small-Footprint Keyword Spotting
Viaarxiv icon

Trustera: A Live Conversation Redaction System

Mar 16, 2023
Evandro Gouvêa, Ali Dadgar, Shahab Jalalvand, Rathi Chengalvarayan, Badrinath Jayakumar, Ryan Price, Nicholas Ruiz, Jennifer McGovern, Srinivas Bangalore, Ben Stern

Figure 1 for Trustera: A Live Conversation Redaction System
Figure 2 for Trustera: A Live Conversation Redaction System
Figure 3 for Trustera: A Live Conversation Redaction System
Viaarxiv icon

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

Sep 22, 2022
Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman

Figure 1 for EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
Figure 2 for EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
Figure 3 for EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
Figure 4 for EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
Viaarxiv icon

M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation

Jul 03, 2022
Jinming Zhao, Hao Yang, Ehsan Shareghi, Gholamreza Haffari

Figure 1 for M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
Figure 2 for M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
Figure 3 for M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
Figure 4 for M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
Viaarxiv icon

SEM-POS: Grammatically and Semantically Correct Video Captioning

Mar 26, 2023
Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

Figure 1 for SEM-POS: Grammatically and Semantically Correct Video Captioning
Figure 2 for SEM-POS: Grammatically and Semantically Correct Video Captioning
Figure 3 for SEM-POS: Grammatically and Semantically Correct Video Captioning
Figure 4 for SEM-POS: Grammatically and Semantically Correct Video Captioning
Viaarxiv icon

Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition

Nov 09, 2022
Yu Chen, Wen Ding, Junjie Lai

Figure 1 for Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition
Figure 2 for Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition
Figure 3 for Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition
Figure 4 for Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition
Viaarxiv icon

Simple and Effective Unsupervised Speech Synthesis

Apr 20, 2022
Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass

Figure 1 for Simple and Effective Unsupervised Speech Synthesis
Figure 2 for Simple and Effective Unsupervised Speech Synthesis
Figure 3 for Simple and Effective Unsupervised Speech Synthesis
Figure 4 for Simple and Effective Unsupervised Speech Synthesis
Viaarxiv icon

Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need

Jul 02, 2022
Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Bozena Kostek

Figure 1 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Figure 2 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Figure 3 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Figure 4 for Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need
Viaarxiv icon

Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech

Aug 10, 2022
Jaejin Cho, Jes'us Villalba, Laureano Moro-Velazquez, Najim Dehak

Figure 1 for Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech
Figure 2 for Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech
Figure 3 for Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech
Figure 4 for Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech
Viaarxiv icon