Alert button

"speech": models, code, and papers
Alert button

Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion

Apr 27, 2022
Sen Chen, Zhilei Liu, Jiaxing Liu, Longbiao Wang

Figure 1 for Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Figure 2 for Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Figure 3 for Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Figure 4 for Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Viaarxiv icon

EURO: ESPnet Unsupervised ASR Open-source Toolkit

Dec 01, 2022
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur

Figure 1 for EURO: ESPnet Unsupervised ASR Open-source Toolkit
Figure 2 for EURO: ESPnet Unsupervised ASR Open-source Toolkit
Figure 3 for EURO: ESPnet Unsupervised ASR Open-source Toolkit
Figure 4 for EURO: ESPnet Unsupervised ASR Open-source Toolkit
Viaarxiv icon

Differentiable Duration Modeling for End-to-End Text-to-Speech

Mar 21, 2022
Bac Nguyen, Fabien Cardinaux, Stefan Uhlich

Figure 1 for Differentiable Duration Modeling for End-to-End Text-to-Speech
Figure 2 for Differentiable Duration Modeling for End-to-End Text-to-Speech
Figure 3 for Differentiable Duration Modeling for End-to-End Text-to-Speech
Figure 4 for Differentiable Duration Modeling for End-to-End Text-to-Speech
Viaarxiv icon

Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility

Feb 05, 2022
Tianqu Kang, Anh-Dung Dinh, Binghong Wang, Tianyuan Du, Yijia Chen, Kevin Chau

Figure 1 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility
Figure 2 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility
Figure 3 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility
Figure 4 for Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility
Viaarxiv icon

Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)

Jul 04, 2022
Ziyao Zhang, Alessio Falai, Ariadna Sanchez, Orazio Angelini, Kayoko Yanagisawa

Figure 1 for Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)
Figure 2 for Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)
Figure 3 for Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)
Figure 4 for Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)
Viaarxiv icon

Accelerating RNN-based Speech Enhancement on a Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization

Oct 14, 2022
Manuele Rusci, Marco Fariselli, Martin Croome, Francesco Paci, Eric Flamand

Figure 1 for Accelerating RNN-based Speech Enhancement on a Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization
Figure 2 for Accelerating RNN-based Speech Enhancement on a Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization
Figure 3 for Accelerating RNN-based Speech Enhancement on a Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization
Figure 4 for Accelerating RNN-based Speech Enhancement on a Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization
Viaarxiv icon

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks

Mar 30, 2022
Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Figure 1 for Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks
Figure 2 for Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks
Figure 3 for Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks
Figure 4 for Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks
Viaarxiv icon

The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition

Feb 21, 2022
Jingdong Li, Yuanyuan Zhu, Dawei Luo, Yun Liu, Guohui Cui, Zhaoxia Li

Figure 1 for The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition
Figure 2 for The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition
Figure 3 for The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition
Figure 4 for The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition
Viaarxiv icon

SCA: Streaming Cross-attention Alignment for Echo Cancellation

Nov 01, 2022
Yang Liu, Yangyang Shi, Yun Li, Kaustubh Kalgaonkar, Sriram Srinivasan, Xin Lei

Figure 1 for SCA: Streaming Cross-attention Alignment for Echo Cancellation
Figure 2 for SCA: Streaming Cross-attention Alignment for Echo Cancellation
Figure 3 for SCA: Streaming Cross-attention Alignment for Echo Cancellation
Figure 4 for SCA: Streaming Cross-attention Alignment for Echo Cancellation
Viaarxiv icon

DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF

Jul 22, 2022
Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii

Figure 1 for DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF
Figure 2 for DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF
Figure 3 for DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF
Figure 4 for DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF
Viaarxiv icon