Alert button

"speech": models, code, and papers
Alert button

The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

Jul 26, 2023
Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj

Figure 1 for The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features
Figure 2 for The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features
Figure 3 for The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features
Figure 4 for The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features
Viaarxiv icon

A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

Add code
Bookmark button
Alert button
Apr 02, 2023
Chenshuang Zhang, Chaoning Zhang, Sheng Zheng, Mengchun Zhang, Maryam Qamar, Sung-Ho Bae, In So Kweon

Figure 1 for A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Figure 2 for A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Figure 3 for A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Figure 4 for A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Viaarxiv icon

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

May 02, 2023
Hendric Voß, Stefan Kopp

Figure 1 for AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
Figure 2 for AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
Figure 3 for AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
Figure 4 for AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
Viaarxiv icon

Some voices are too common: Building fair speech recognition systems using the Common Voice dataset

Jun 01, 2023
Lucas Maison, Yannick Estève

Figure 1 for Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Figure 2 for Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Figure 3 for Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Figure 4 for Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Viaarxiv icon

Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding

Jun 14, 2023
Sanjana Sankar, Denis Beautemps, Frédéric Elisei, Olivier Perrotin, Thomas Hueber

Figure 1 for Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding
Figure 2 for Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding
Figure 3 for Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding
Figure 4 for Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding
Viaarxiv icon

UniFLG: Unified Facial Landmark Generator from Text or Speech

Add code
Bookmark button
Alert button
Feb 28, 2023
Kentaro Mitsui, Yukiya Hono, Kei Sawada

Figure 1 for UniFLG: Unified Facial Landmark Generator from Text or Speech
Figure 2 for UniFLG: Unified Facial Landmark Generator from Text or Speech
Figure 3 for UniFLG: Unified Facial Landmark Generator from Text or Speech
Figure 4 for UniFLG: Unified Facial Landmark Generator from Text or Speech
Viaarxiv icon

Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

Aug 11, 2023
Mohammad Soleymanpour, Mahmoud Al Ismail, Fahimeh Bahmaninezhad, Kshitiz Kumar, Jian Wu

Figure 1 for Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss
Figure 2 for Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss
Figure 3 for Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss
Figure 4 for Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss
Viaarxiv icon

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation

Aug 17, 2023
Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An

Figure 1 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Figure 2 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Figure 3 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Figure 4 for OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Viaarxiv icon

Controlling Federated Learning for Covertness

Aug 17, 2023
Adit Jain, Vikram Krishnamurthy

Figure 1 for Controlling Federated Learning for Covertness
Figure 2 for Controlling Federated Learning for Covertness
Figure 3 for Controlling Federated Learning for Covertness
Figure 4 for Controlling Federated Learning for Covertness
Viaarxiv icon

A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on Social Media Using Synthetic Data

Aug 15, 2023
Mst Shapna Akter, Hossain Shahriar, Alfredo Cuzzocrea

Figure 1 for A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on Social Media Using Synthetic Data
Figure 2 for A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on Social Media Using Synthetic Data
Figure 3 for A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on Social Media Using Synthetic Data
Figure 4 for A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on Social Media Using Synthetic Data
Viaarxiv icon