Alert button

"speech": models, code, and papers
Alert button

AudioLM: a Language Modeling Approach to Audio Generation

Sep 07, 2022
Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour

Figure 1 for AudioLM: a Language Modeling Approach to Audio Generation
Figure 2 for AudioLM: a Language Modeling Approach to Audio Generation
Figure 3 for AudioLM: a Language Modeling Approach to Audio Generation
Figure 4 for AudioLM: a Language Modeling Approach to Audio Generation
Viaarxiv icon

Masks Fusion with Multi-Target Learning For Speech Enhancement

Sep 28, 2021
Liangchen Zhou, Wenbin Jiang, Jingyan Xu, Fei Wen, Peilin Liu

Figure 1 for Masks Fusion with Multi-Target Learning For Speech Enhancement
Figure 2 for Masks Fusion with Multi-Target Learning For Speech Enhancement
Figure 3 for Masks Fusion with Multi-Target Learning For Speech Enhancement
Figure 4 for Masks Fusion with Multi-Target Learning For Speech Enhancement
Viaarxiv icon

Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation

Mar 18, 2022
Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi

Figure 1 for Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation
Figure 2 for Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation
Figure 3 for Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation
Figure 4 for Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation
Viaarxiv icon

Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech

Dec 27, 2021
Gaoussou Youssouf Kebe, Luke E. Richards, Edward Raff, Francis Ferraro, Cynthia Matuszek

Figure 1 for Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech
Figure 2 for Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech
Figure 3 for Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech
Figure 4 for Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech
Viaarxiv icon

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Oct 14, 2021
Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf

Figure 1 for Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Figure 2 for Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Figure 3 for Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Figure 4 for Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Viaarxiv icon

LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading

Dec 09, 2021
Leyuan Qu, Cornelius Weber, Stefan Wermter

Figure 1 for LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Figure 2 for LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Figure 3 for LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Figure 4 for LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Viaarxiv icon

Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems

Oct 13, 2021
Mohd Abbas Zaidi, Beomseok Lee, Nikhil Kumar Lakumarapu, Sangha Kim, Chanwoo Kim

Figure 1 for Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems
Figure 2 for Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems
Figure 3 for Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems
Figure 4 for Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems
Viaarxiv icon

CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Oct 14, 2021
Arnaldo Candido Junior, Edresson Casanova, Anderson Soares, Frederico Santos de Oliveira, Lucas Oliveira, Ricardo Corso Fernandes Junior, Daniel Peixoto Pinto da Silva, Fernando Gorgulho Fayet, Bruno Baldissera Carlotto, Lucas Rafael Stefanel Gris, Sandra Maria Aluísio

Figure 1 for CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
Figure 2 for CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
Figure 3 for CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
Figure 4 for CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
Viaarxiv icon

Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

Oct 21, 2021
Ting-Yao Hu, Mohammadreza Armandpour, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Oncel Tuzel

Figure 1 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Figure 2 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Figure 3 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Figure 4 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Viaarxiv icon

L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models

Mar 25, 2022
Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, Raviraj Joshi

Figure 1 for L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models
Figure 2 for L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models
Figure 3 for L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models
Figure 4 for L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models
Viaarxiv icon