Alert button

"speech": models, code, and papers
Alert button

SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

Add code
Bookmark button
Alert button
Apr 27, 2021
William Chan, Daniel Park, Chris Lee, Yu Zhang, Quoc Le, Mohammad Norouzi

Figure 1 for SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
Figure 2 for SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
Viaarxiv icon

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

Oct 28, 2020
Sarala Padi, Dinesh Manocha, Ram D. Sriram

Figure 1 for Multi-Window Data Augmentation Approach for Speech Emotion Recognition
Figure 2 for Multi-Window Data Augmentation Approach for Speech Emotion Recognition
Figure 3 for Multi-Window Data Augmentation Approach for Speech Emotion Recognition
Figure 4 for Multi-Window Data Augmentation Approach for Speech Emotion Recognition
Viaarxiv icon

Which one is more toxic? Findings from Jigsaw Rate Severity of Toxic Comments

Jun 27, 2022
Millon Madhur Das, Punyajoy Saha, Mithun Das

Figure 1 for Which one is more toxic? Findings from Jigsaw Rate Severity of Toxic Comments
Figure 2 for Which one is more toxic? Findings from Jigsaw Rate Severity of Toxic Comments
Viaarxiv icon

Training end-to-end speech-to-text models on mobile phones

Dec 07, 2021
Zitha S, Raghavendra Rao Suresh, Pooja Rao, T. V. Prabhakar

Figure 1 for Training end-to-end speech-to-text models on mobile phones
Figure 2 for Training end-to-end speech-to-text models on mobile phones
Figure 3 for Training end-to-end speech-to-text models on mobile phones
Figure 4 for Training end-to-end speech-to-text models on mobile phones
Viaarxiv icon

Audio-visual Speech Separation with Adversarially Disentangled Visual Representation

Add code
Bookmark button
Alert button
Nov 29, 2020
Peng Zhang, Jiaming Xu, Jing shi, Yunzhe Hao, Bo Xu

Figure 1 for Audio-visual Speech Separation with Adversarially Disentangled Visual Representation
Figure 2 for Audio-visual Speech Separation with Adversarially Disentangled Visual Representation
Figure 3 for Audio-visual Speech Separation with Adversarially Disentangled Visual Representation
Figure 4 for Audio-visual Speech Separation with Adversarially Disentangled Visual Representation
Viaarxiv icon

Detecting and analysing spontaneous oral cancer speech in the wild

Add code
Bookmark button
Alert button
Jul 28, 2020
Bence Mark Halpern, Rob van Son, Michiel van den Brekel, Odette Scharenborg

Figure 1 for Detecting and analysing spontaneous oral cancer speech in the wild
Figure 2 for Detecting and analysing spontaneous oral cancer speech in the wild
Figure 3 for Detecting and analysing spontaneous oral cancer speech in the wild
Figure 4 for Detecting and analysing spontaneous oral cancer speech in the wild
Viaarxiv icon

Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

Oct 08, 2021
Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

Figure 1 for Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Figure 2 for Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Figure 3 for Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Figure 4 for Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Viaarxiv icon

LSSED: a large-scale dataset and benchmark for speech emotion recognition

Add code
Bookmark button
Alert button
Jan 30, 2021
Weiquan Fan, Xiangmin Xu, Xiaofen Xing, Weidong Chen, Dongyan Huang

Figure 1 for LSSED: a large-scale dataset and benchmark for speech emotion recognition
Figure 2 for LSSED: a large-scale dataset and benchmark for speech emotion recognition
Figure 3 for LSSED: a large-scale dataset and benchmark for speech emotion recognition
Figure 4 for LSSED: a large-scale dataset and benchmark for speech emotion recognition
Viaarxiv icon

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

Add code
Bookmark button
Alert button
Mar 12, 2021
Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, Anton Mitrofanov, Ivan Medennikov, Yuri Matveev

Figure 1 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Figure 2 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Figure 3 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Figure 4 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Viaarxiv icon

On the Role of Style in Parsing Speech with Neural Models

Add code
Bookmark button
Alert button
Oct 08, 2020
Trang Tran, Jiahong Yuan, Yang Liu, Mari Ostendorf

Figure 1 for On the Role of Style in Parsing Speech with Neural Models
Figure 2 for On the Role of Style in Parsing Speech with Neural Models
Figure 3 for On the Role of Style in Parsing Speech with Neural Models
Figure 4 for On the Role of Style in Parsing Speech with Neural Models
Viaarxiv icon