Alert button

"speech": models, code, and papers
Alert button

ICASSP 2023 Deep Speech Enhancement Challenge

Add code
Bookmark button
Alert button
Mar 21, 2023
Harishchandra Dubey, Ashkan Aazami, Vishak Gopal, Babak Naderi, Sebastian Braun, Ross Cutler, Alex Ju, Mehdi Zohourian, Min Tang, Hannes Gamper, Mehrsa Golestaneh, Robert Aichner

Figure 1 for ICASSP 2023 Deep Speech Enhancement Challenge
Viaarxiv icon

A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model

May 19, 2023
Ibrahim Malik, Siddique Latif, Raja Jurdak, Björn Schuller

Figure 1 for A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model
Figure 2 for A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model
Figure 3 for A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model
Figure 4 for A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model
Viaarxiv icon

Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

Add code
Bookmark button
Alert button
May 18, 2023
Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien

Figure 1 for Parameter-Efficient Learning for Text-to-Speech Accent Adaptation
Figure 2 for Parameter-Efficient Learning for Text-to-Speech Accent Adaptation
Figure 3 for Parameter-Efficient Learning for Text-to-Speech Accent Adaptation
Figure 4 for Parameter-Efficient Learning for Text-to-Speech Accent Adaptation
Viaarxiv icon

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data

Add code
Bookmark button
Alert button
Jul 04, 2023
Guangzhi Sun, Chao Zhang, Ivan Vulić, Paweł Budzianowski, Philip C. Woodland

Figure 1 for Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data
Figure 2 for Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data
Figure 3 for Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data
Figure 4 for Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data
Viaarxiv icon

Refining a Deep Learning-based Formant Tracker using Linear Prediction Methods

Add code
Bookmark button
Alert button
Aug 17, 2023
Paavo Alku, Sudarsana Reddy Kadiri, Dhananjaya Gowda

Viaarxiv icon

ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus

Feb 28, 2023
Ajinkya Kulkarni, Atharva Kulkarni, Sara Abedalmonem Mohammad Shatnawi, Hanan Aldarmaki

Figure 1 for ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Figure 2 for ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Figure 3 for ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Figure 4 for ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Viaarxiv icon

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Apr 03, 2023
Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Figure 1 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Figure 2 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Figure 3 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Figure 4 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Viaarxiv icon

Acoustic absement in detail: Quantifying acoustic differences across time-series representations of speech data

Add code
Bookmark button
Alert button
Apr 14, 2023
Matthew C. Kelley

Figure 1 for Acoustic absement in detail: Quantifying acoustic differences across time-series representations of speech data
Viaarxiv icon

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Add code
Bookmark button
Alert button
Mar 03, 2023
Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu

Figure 1 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 2 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 3 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 4 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Viaarxiv icon

LAST: Scalable Lattice-Based Speech Modelling in JAX

Add code
Bookmark button
Alert button
Apr 25, 2023
Ke Wu, Ehsan Variani, Tom Bagby, Michael Riley

Figure 1 for LAST: Scalable Lattice-Based Speech Modelling in JAX
Figure 2 for LAST: Scalable Lattice-Based Speech Modelling in JAX
Figure 3 for LAST: Scalable Lattice-Based Speech Modelling in JAX
Viaarxiv icon