Alert button

"speech": models, code, and papers
Alert button

Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts

Add code
Bookmark button
Alert button
Sep 15, 2022
Vincent Karas, Andreas Triantafyllopoulos, Meishu Song, Björn W. Schuller

Figure 1 for Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts
Figure 2 for Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts
Figure 3 for Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts
Viaarxiv icon

Streaming on-device detection of device directed speech from voice and touch-based invocation

Oct 09, 2021
Ognjen Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, Sachin Kajarekar

Figure 1 for Streaming on-device detection of device directed speech from voice and touch-based invocation
Figure 2 for Streaming on-device detection of device directed speech from voice and touch-based invocation
Figure 3 for Streaming on-device detection of device directed speech from voice and touch-based invocation
Figure 4 for Streaming on-device detection of device directed speech from voice and touch-based invocation
Viaarxiv icon

Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition

Add code
Bookmark button
Alert button
Oct 12, 2021
Li-Wei Chen, Alexander Rudnicky

Figure 1 for Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
Figure 2 for Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
Figure 3 for Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
Figure 4 for Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
Viaarxiv icon

Streaming Multi-talker Speech Recognition with Joint Speaker Identification

Add code
Bookmark button
Alert button
Apr 05, 2021
Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

Figure 1 for Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Figure 2 for Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Figure 3 for Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Figure 4 for Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Viaarxiv icon

Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement

May 19, 2021
Guillaume Carbajal, Julius Richter, Timo Gerkmann

Figure 1 for Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement
Figure 2 for Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement
Figure 3 for Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement
Figure 4 for Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement
Viaarxiv icon

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

Jan 25, 2022
Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang

Figure 1 for Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models
Figure 2 for Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models
Figure 3 for Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models
Figure 4 for Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models
Viaarxiv icon

Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM

Add code
Bookmark button
Alert button
Oct 26, 2022
Bagus Tris Atmaja, Masato Akagi

Figure 1 for Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM
Figure 2 for Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM
Figure 3 for Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM
Figure 4 for Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM
Viaarxiv icon

Searching for Discriminative Words in Multidimensional Continuous Feature Space

Add code
Bookmark button
Alert button
Nov 26, 2022
Marius Sajgalik, Michal Barla, Maria Bielikova

Figure 1 for Searching for Discriminative Words in Multidimensional Continuous Feature Space
Figure 2 for Searching for Discriminative Words in Multidimensional Continuous Feature Space
Figure 3 for Searching for Discriminative Words in Multidimensional Continuous Feature Space
Figure 4 for Searching for Discriminative Words in Multidimensional Continuous Feature Space
Viaarxiv icon

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

Add code
Bookmark button
Alert button
Mar 28, 2021
Keon Lee, Kyumin Park, Daeyoung Kim

Figure 1 for STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Figure 2 for STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Figure 3 for STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Figure 4 for STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Viaarxiv icon

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

Add code
Bookmark button
Alert button
Jun 05, 2022
Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu

Figure 1 for LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Figure 2 for LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Figure 3 for LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Figure 4 for LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Viaarxiv icon