Alert button

"speech": models, code, and papers
Alert button

MooseNet: A trainable metric for synthesized speech with plda backend

Add code
Bookmark button
Alert button
Jan 17, 2023
Ondřej Plátek, Ondřej Dušek

Figure 1 for MooseNet: A trainable metric for synthesized speech with plda backend
Figure 2 for MooseNet: A trainable metric for synthesized speech with plda backend
Viaarxiv icon

Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection

Jun 03, 2023
Bonaventure F. P. Dossou, Atnafu Lambebo Tonja, Chris Chinenye Emezue, Tobi Olatunji, Naome A Etori, Salomey Osei, Tosin Adewumi, Sahib Singh

Figure 1 for Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection
Figure 2 for Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection
Figure 3 for Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection
Figure 4 for Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection
Viaarxiv icon

JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

Add code
Bookmark button
Alert button
Oct 05, 2022
Mayumi Ohta, Julia Kreutzer, Stefan Riezler

Figure 1 for JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Figure 2 for JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Figure 3 for JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Figure 4 for JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Viaarxiv icon

Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition

Add code
Bookmark button
Alert button
May 29, 2023
Xiaoliang Wu, Peter Bell, Ajitha Rajan

Figure 1 for Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition
Figure 2 for Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition
Figure 3 for Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition
Figure 4 for Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition
Viaarxiv icon

Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Dec 07, 2022
Fenglin Ding, Genshun Wan, Pengcheng Li, Jia Pan, Cong Liu

Figure 1 for Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Figure 2 for Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Figure 3 for Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Figure 4 for Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Viaarxiv icon

SPACE: Speech-driven Portrait Animation with Controllable Expression

Add code
Bookmark button
Alert button
Dec 07, 2022
Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Rafael Valle, Ming-Yu Liu

Figure 1 for SPACE: Speech-driven Portrait Animation with Controllable Expression
Figure 2 for SPACE: Speech-driven Portrait Animation with Controllable Expression
Figure 3 for SPACE: Speech-driven Portrait Animation with Controllable Expression
Figure 4 for SPACE: Speech-driven Portrait Animation with Controllable Expression
Viaarxiv icon

A Versatile Diffusion-based Generative Refiner for Speech Enhancement

Add code
Bookmark button
Alert button
Oct 27, 2022
Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

Figure 1 for A Versatile Diffusion-based Generative Refiner for Speech Enhancement
Figure 2 for A Versatile Diffusion-based Generative Refiner for Speech Enhancement
Figure 3 for A Versatile Diffusion-based Generative Refiner for Speech Enhancement
Viaarxiv icon

Modulation spectral features for speech emotion recognition using deep neural networks

Add code
Bookmark button
Alert button
Jan 14, 2023
Premjeet Singh, Md Sahidullah, Goutam Saha

Figure 1 for Modulation spectral features for speech emotion recognition using deep neural networks
Figure 2 for Modulation spectral features for speech emotion recognition using deep neural networks
Figure 3 for Modulation spectral features for speech emotion recognition using deep neural networks
Figure 4 for Modulation spectral features for speech emotion recognition using deep neural networks
Viaarxiv icon

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

Nov 02, 2022
Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-yi Lee, David Harwath

Figure 1 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 2 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 3 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Figure 4 for M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Viaarxiv icon

AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages

Mar 22, 2023
Chris Chinenye Emezue, Sanchit Gandhi, Lewis Tunstall, Abubakar Abid, Joshua Meyer, Quentin Lhoest, Pete Allen, Patrick Von Platen, Douwe Kiela, Yacine Jernite, Julien Chaumond, Merve Noyan, Omar Sanseviero

Figure 1 for AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages
Figure 2 for AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages
Figure 3 for AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages
Figure 4 for AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages
Viaarxiv icon