Alert button

"speech": models, code, and papers
Alert button

Voice Morphing: Two Identities in One Voice

Sep 05, 2023
Sushanta K. Pani, Anurag Chowdhury, Morgan Sandler, Arun Ross

Figure 1 for Voice Morphing: Two Identities in One Voice
Figure 2 for Voice Morphing: Two Identities in One Voice
Figure 3 for Voice Morphing: Two Identities in One Voice
Figure 4 for Voice Morphing: Two Identities in One Voice
Viaarxiv icon

MPE4G: Multimodal Pretrained Encoder for Co-Speech Gesture Generation

May 25, 2023
Gwantae Kim, Seonghyeok Noh, Insung Ham, Hanseok Ko

Figure 1 for MPE4G: Multimodal Pretrained Encoder for Co-Speech Gesture Generation
Figure 2 for MPE4G: Multimodal Pretrained Encoder for Co-Speech Gesture Generation
Figure 3 for MPE4G: Multimodal Pretrained Encoder for Co-Speech Gesture Generation
Figure 4 for MPE4G: Multimodal Pretrained Encoder for Co-Speech Gesture Generation
Viaarxiv icon

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization

Jun 07, 2023
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix

Figure 1 for Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Figure 2 for Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Figure 3 for Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Figure 4 for Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Viaarxiv icon

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

May 25, 2023
Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li, Furu Wei

Figure 1 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 2 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 3 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 4 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Viaarxiv icon

CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation

May 24, 2023
Yan Zhou, Qingkai Fang, Yang Feng

Figure 1 for CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation
Figure 2 for CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation
Figure 3 for CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation
Figure 4 for CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation
Viaarxiv icon

Two-stage Autoencoder Neural Network for 3D Speech Enhancement

Jun 08, 2023
Han Yin, Jisheng Bai, Siwei Huang, Mou Wang, Yafei Jia, Jianfeng Chen

Figure 1 for Two-stage Autoencoder Neural Network for 3D Speech Enhancement
Figure 2 for Two-stage Autoencoder Neural Network for 3D Speech Enhancement
Figure 3 for Two-stage Autoencoder Neural Network for 3D Speech Enhancement
Figure 4 for Two-stage Autoencoder Neural Network for 3D Speech Enhancement
Viaarxiv icon

Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation

May 26, 2023
Yuta Nishikawa, Satoshi Nakamura

Figure 1 for Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
Figure 2 for Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
Figure 3 for Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
Figure 4 for Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
Viaarxiv icon

Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model

May 26, 2023
Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng

Figure 1 for Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Figure 2 for Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Figure 3 for Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Figure 4 for Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Viaarxiv icon

FonMTL: Towards Multitask Learning for the Fon Language

Sep 11, 2023
Bonaventure F. P. Dossou, Iffanice Houndayi, Pamely Zantou, Gilles Hacheme

Figure 1 for FonMTL: Towards Multitask Learning for the Fon Language
Figure 2 for FonMTL: Towards Multitask Learning for the Fon Language
Figure 3 for FonMTL: Towards Multitask Learning for the Fon Language
Figure 4 for FonMTL: Towards Multitask Learning for the Fon Language
Viaarxiv icon

Mapping EEG Signals to Visual Stimuli: A Deep Learning Approach to Match vs. Mismatch Classification

Sep 08, 2023
Yiqian Yang, Zhengqiao Zhao, Qian Wang, Yan Yang, Jingdong Chen

Figure 1 for Mapping EEG Signals to Visual Stimuli: A Deep Learning Approach to Match vs. Mismatch Classification
Figure 2 for Mapping EEG Signals to Visual Stimuli: A Deep Learning Approach to Match vs. Mismatch Classification
Figure 3 for Mapping EEG Signals to Visual Stimuli: A Deep Learning Approach to Match vs. Mismatch Classification
Figure 4 for Mapping EEG Signals to Visual Stimuli: A Deep Learning Approach to Match vs. Mismatch Classification
Viaarxiv icon