Alert button

"speech": models, code, and papers
Alert button

Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion

Jun 28, 2022
Ahmad Aloradi, Wolfgang Mack, Mohamed Elminshawi, Emanuël A. P. Habets

Figure 1 for Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
Figure 2 for Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
Figure 3 for Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
Figure 4 for Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
Viaarxiv icon

Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling

Add code
Bookmark button
Alert button
May 27, 2021
Chenpeng Du, Kai Yu

Figure 1 for Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling
Figure 2 for Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling
Figure 3 for Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling
Figure 4 for Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling
Viaarxiv icon

General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

Add code
Bookmark button
Alert button
Feb 03, 2021
Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha

Figure 1 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Figure 2 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Figure 3 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Figure 4 for General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Viaarxiv icon

Learning from human perception to improve automatic speaker verification in style-mismatched conditions

Jun 28, 2022
Amber Afshan, Abeer Alwan

Figure 1 for Learning from human perception to improve automatic speaker verification in style-mismatched conditions
Figure 2 for Learning from human perception to improve automatic speaker verification in style-mismatched conditions
Figure 3 for Learning from human perception to improve automatic speaker verification in style-mismatched conditions
Viaarxiv icon

Non-native English lexicon creation for bilingual speech synthesis

Add code
Bookmark button
Alert button
Jun 21, 2021
Arun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam, Nagaraj Adiga, Sharath Adavanne

Figure 1 for Non-native English lexicon creation for bilingual speech synthesis
Figure 2 for Non-native English lexicon creation for bilingual speech synthesis
Figure 3 for Non-native English lexicon creation for bilingual speech synthesis
Figure 4 for Non-native English lexicon creation for bilingual speech synthesis
Viaarxiv icon

Model Blending for Text Classification

Aug 05, 2022
Ramit Pahwa

Figure 1 for Model Blending for Text Classification
Figure 2 for Model Blending for Text Classification
Figure 3 for Model Blending for Text Classification
Figure 4 for Model Blending for Text Classification
Viaarxiv icon

Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI

Jun 16, 2021
Laxmi Pandey, Ahmed Sabbir Arif

Figure 1 for Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI
Figure 2 for Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI
Figure 3 for Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI
Figure 4 for Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI
Viaarxiv icon

FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference

Add code
Bookmark button
Alert button
Sep 21, 2022
Qinglan Wei, Xuling Huang, Yuan Zhang

Figure 1 for FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference
Figure 2 for FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference
Figure 3 for FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference
Figure 4 for FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference
Viaarxiv icon

Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics

Add code
Bookmark button
Alert button
Jun 07, 2022
Aylin Caliskan, Pimparkar Parth Ajay, Tessa Charlesworth, Robert Wolfe, Mahzarin R. Banaji

Figure 1 for Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics
Figure 2 for Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics
Figure 3 for Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics
Figure 4 for Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics
Viaarxiv icon

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

Add code
Bookmark button
Alert button
Jul 26, 2021
Csaba Zainkó, László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Alexandra Markó, Géza Németh, Tamás Gábor Csapó

Figure 1 for Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging
Figure 2 for Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging
Figure 3 for Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging
Figure 4 for Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging
Viaarxiv icon