Alert button

"speech": models, code, and papers
Alert button

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

Sep 29, 2023
Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar

Viaarxiv icon

Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data

Jun 29, 2023
Jarod Duret, Titouan Parcollet, Yannick Estève

Figure 1 for Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data
Figure 2 for Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data
Figure 3 for Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data
Figure 4 for Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data
Viaarxiv icon

Latent Phrase Matching for Dysarthric Speech

Jun 08, 2023
Colin Lea, Dianna Yee, Jaya Narain, Zifang Huang, Lauren Tooley, Jeffrey P. Bigham, Leah Findlater

Figure 1 for Latent Phrase Matching for Dysarthric Speech
Figure 2 for Latent Phrase Matching for Dysarthric Speech
Figure 3 for Latent Phrase Matching for Dysarthric Speech
Figure 4 for Latent Phrase Matching for Dysarthric Speech
Viaarxiv icon

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

Aug 23, 2023
Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-jussà, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzmán, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang

Figure 1 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Figure 2 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Figure 3 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Figure 4 for SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
Viaarxiv icon

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Jul 31, 2023
Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba

Figure 1 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 2 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 3 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Figure 4 for Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Viaarxiv icon

Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features

Aug 17, 2023
Sudarsana Reddy Kadiri, Manila Kodali, Paavo Alku

Figure 1 for Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features
Figure 2 for Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features
Figure 3 for Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features
Figure 4 for Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features
Viaarxiv icon

Deep Speech Synthesis from MRI-Based Articulatory Representations

Jul 05, 2023
Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W Black, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli

Figure 1 for Deep Speech Synthesis from MRI-Based Articulatory Representations
Figure 2 for Deep Speech Synthesis from MRI-Based Articulatory Representations
Figure 3 for Deep Speech Synthesis from MRI-Based Articulatory Representations
Figure 4 for Deep Speech Synthesis from MRI-Based Articulatory Representations
Viaarxiv icon

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Sep 18, 2023
Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Figure 1 for HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Figure 2 for HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Figure 3 for HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Viaarxiv icon

The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems

Jul 28, 2023
Andreas Liesenfeld, Alianda Lopez, Mark Dingemanse

Figure 1 for The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems
Figure 2 for The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems
Figure 3 for The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems
Figure 4 for The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems
Viaarxiv icon

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

Jul 20, 2023
Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu

Figure 1 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Figure 2 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Figure 3 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Figure 4 for Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Viaarxiv icon