Alert button

"speech": models, code, and papers
Alert button

Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

Oct 16, 2023
Cheol Jun Cho, Abdelrahman Mohamed, Alan W Black, Gopala K. Anumanchipalli

Viaarxiv icon

Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior

Oct 05, 2023
Jinting Wang, Li Liu, Jun Wang, Hei Victor Cheng

Figure 1 for Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior
Figure 2 for Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior
Figure 3 for Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior
Figure 4 for Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior
Viaarxiv icon

Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference

Dec 15, 2023
Bartosz Wójcik, Alessio Devoto, Karol Pustelnik, Pasquale Minervini, Simone Scardapane

Viaarxiv icon

Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors

Oct 25, 2023
Marek Kubis, Paweł Skórzewski, Marcin Sowański, Tomasz Ziętkiewicz

Figure 1 for Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Figure 2 for Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Figure 3 for Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Figure 4 for Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Viaarxiv icon

EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

Dec 11, 2023
Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim

Viaarxiv icon

Learning Co-Speech Gesture for Multimodal Aphasia Type Detection

Oct 20, 2023
Daeun Lee, Sejung Son, Hyolim Jeon, Seungbae Kim, Jinyoung Han

Viaarxiv icon

Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

Dec 14, 2023
Xi Chen, Chang Gao, Zuowen Wang, Longbiao Cheng, Sheng Zhou, Shih-Chii Liu, Tobi Delbruck

Viaarxiv icon

LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild

Nov 21, 2023
David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

Viaarxiv icon

Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

Oct 04, 2023
Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-yi Lee

Figure 1 for Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages
Figure 2 for Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages
Figure 3 for Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages
Figure 4 for Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages
Viaarxiv icon

Crowdsourced and Automatic Speech Prominence Estimation

Oct 12, 2023
Max Morrison, Pranav Pawar, Nathan Pruyne, Jennifer Cole, Bryan Pardo

Figure 1 for Crowdsourced and Automatic Speech Prominence Estimation
Figure 2 for Crowdsourced and Automatic Speech Prominence Estimation
Figure 3 for Crowdsourced and Automatic Speech Prominence Estimation
Figure 4 for Crowdsourced and Automatic Speech Prominence Estimation
Viaarxiv icon