Alert button

"speech": models, code, and papers
Alert button

Dual Learning for Large Vocabulary On-Device ASR

Jan 11, 2023
Cal Peyser, Ronny Huang, Tara Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho

Figure 1 for Dual Learning for Large Vocabulary On-Device ASR
Figure 2 for Dual Learning for Large Vocabulary On-Device ASR
Figure 3 for Dual Learning for Large Vocabulary On-Device ASR
Figure 4 for Dual Learning for Large Vocabulary On-Device ASR
Viaarxiv icon

Improving And Analyzing Neural Speaker Embeddings for ASR

Jan 11, 2023
Christoph Lüscher, Jingjing Xu, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

Figure 1 for Improving And Analyzing Neural Speaker Embeddings for ASR
Figure 2 for Improving And Analyzing Neural Speaker Embeddings for ASR
Figure 3 for Improving And Analyzing Neural Speaker Embeddings for ASR
Figure 4 for Improving And Analyzing Neural Speaker Embeddings for ASR
Viaarxiv icon

DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition

Aug 01, 2022
Z. Guo, C. Chen, E. S. Chng

Figure 1 for DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition
Figure 2 for DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition
Figure 3 for DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition
Figure 4 for DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition
Viaarxiv icon

Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility

Feb 05, 2022
Tianqu Kang, Anh-Dung Dinh, Binghong Wang, Tianyuan Du, Yijia Chen, Kevin Chau

Figure 1 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility
Figure 2 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility
Figure 3 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility
Figure 4 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility
Viaarxiv icon

Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion

Apr 27, 2022
Sen Chen, Zhilei Liu, Jiaxing Liu, Longbiao Wang

Figure 1 for Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Figure 2 for Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Figure 3 for Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Figure 4 for Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Viaarxiv icon

On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems

Jun 15, 2022
Kai Li, Yi Luo

Figure 1 for On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems
Figure 2 for On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems
Figure 3 for On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems
Viaarxiv icon

Differentiable Duration Modeling for End-to-End Text-to-Speech

Mar 21, 2022
Bac Nguyen, Fabien Cardinaux, Stefan Uhlich

Figure 1 for Differentiable Duration Modeling for End-to-End Text-to-Speech
Figure 2 for Differentiable Duration Modeling for End-to-End Text-to-Speech
Figure 3 for Differentiable Duration Modeling for End-to-End Text-to-Speech
Figure 4 for Differentiable Duration Modeling for End-to-End Text-to-Speech
Viaarxiv icon

Revisiting Speech Content Privacy

Oct 13, 2021
Jennifer Williams, Junichi Yamagishi, Paul-Gauthier Noe, Cassia Valentini Botinhao, Jean-Francois Bonastre

Figure 1 for Revisiting Speech Content Privacy
Figure 2 for Revisiting Speech Content Privacy
Figure 3 for Revisiting Speech Content Privacy
Viaarxiv icon

Vocal Breath Sound Based Gender Classification

Nov 11, 2022
Mohammad Shaique Solanki, Ashutosh M Bharadwaj, Jeevan K, Prasanta Kumar Ghosh

Figure 1 for Vocal Breath Sound Based Gender Classification
Figure 2 for Vocal Breath Sound Based Gender Classification
Figure 3 for Vocal Breath Sound Based Gender Classification
Figure 4 for Vocal Breath Sound Based Gender Classification
Viaarxiv icon

Lived Experience Matters: Automatic Detection of Stigma toward People Who Use Substances on Social Media

Feb 04, 2023
Salvatore Giorgi, Douglas Bellew, Daniel Roy Sadek Habib, Joao Sedoc, Chase Smitterberg, Amanda Devoto, McKenzie Himelein-Wachowiak, Brenda Curtis

Figure 1 for Lived Experience Matters: Automatic Detection of Stigma toward People Who Use Substances on Social Media
Figure 2 for Lived Experience Matters: Automatic Detection of Stigma toward People Who Use Substances on Social Media
Figure 3 for Lived Experience Matters: Automatic Detection of Stigma toward People Who Use Substances on Social Media
Figure 4 for Lived Experience Matters: Automatic Detection of Stigma toward People Who Use Substances on Social Media
Viaarxiv icon