Alert button

"speech": models, code, and papers
Alert button

Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding

Apr 11, 2022
Sanjana Sankar, Denis Beautemps, Thomas Hueber

Figure 1 for Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding
Figure 2 for Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding
Figure 3 for Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding
Viaarxiv icon

Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech

Add code
Bookmark button
Alert button
Jul 01, 2021
Manuel Rebol, Christian Gütl, Krzysztof Pietroszek

Figure 1 for Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech
Figure 2 for Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech
Figure 3 for Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech
Figure 4 for Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech
Viaarxiv icon

Fine-grained style control in Transformer-based Text-to-speech Synthesis

Add code
Bookmark button
Alert button
Oct 12, 2021
Li-Wei Chen, Alexander Rudnicky

Figure 1 for Fine-grained style control in Transformer-based Text-to-speech Synthesis
Figure 2 for Fine-grained style control in Transformer-based Text-to-speech Synthesis
Figure 3 for Fine-grained style control in Transformer-based Text-to-speech Synthesis
Figure 4 for Fine-grained style control in Transformer-based Text-to-speech Synthesis
Viaarxiv icon

Silent versus modal multi-speaker speech recognition from ultrasound and video

Feb 27, 2021
Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

Figure 1 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Figure 2 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Figure 3 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Figure 4 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Viaarxiv icon

FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis

Add code
Bookmark button
Alert button
Sep 27, 2021
Manh Luong, Viet Anh Tran

Figure 1 for FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
Figure 2 for FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
Figure 3 for FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
Figure 4 for FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
Viaarxiv icon

Transformer-S2A: Robust and Efficient Speech-to-Animation

Add code
Bookmark button
Alert button
Nov 18, 2021
Liyang Chen, Zhiyong Wu, Jun Ling, Runnan Li, Xu Tan, Sheng Zhao

Figure 1 for Transformer-S2A: Robust and Efficient Speech-to-Animation
Figure 2 for Transformer-S2A: Robust and Efficient Speech-to-Animation
Figure 3 for Transformer-S2A: Robust and Efficient Speech-to-Animation
Figure 4 for Transformer-S2A: Robust and Efficient Speech-to-Animation
Viaarxiv icon

No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration

Nov 01, 2022
Jose Vargas-Quiros, Laura Cabrera-Quiros, Hayley Hung

Figure 1 for No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Figure 2 for No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Figure 3 for No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Figure 4 for No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Viaarxiv icon

The NTNU System for Formosa Speech Recognition Challenge 2020

Add code
Bookmark button
Alert button
Apr 09, 2021
Fu-An Chao, Tien-Hong Lo, Shi-Yan Weng, Shih-Hsuan Chiu, Yao-Ting Sung, Berlin Chen

Figure 1 for The NTNU System for Formosa Speech Recognition Challenge 2020
Figure 2 for The NTNU System for Formosa Speech Recognition Challenge 2020
Figure 3 for The NTNU System for Formosa Speech Recognition Challenge 2020
Figure 4 for The NTNU System for Formosa Speech Recognition Challenge 2020
Viaarxiv icon

A Survey of Online Hate Speech through the Causal Lens

Sep 16, 2021
Antigoni-Maria Founta, Lucia Specia

Figure 1 for A Survey of Online Hate Speech through the Causal Lens
Viaarxiv icon

ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease

Oct 25, 2021
Yash Kumar, Piyush Maheshwari, Shreyansh Joshi, Veeky Baths

Figure 1 for ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease
Figure 2 for ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease
Figure 3 for ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease
Figure 4 for ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease
Viaarxiv icon