Speech


GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

Add code
Jun 26, 2025
Viaarxiv icon

Aligning Spoken Dialogue Models from User Interactions

Add code
Jun 26, 2025
Viaarxiv icon

Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort

Add code
Jun 26, 2025
Viaarxiv icon

Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings

Add code
Jun 26, 2025
Viaarxiv icon

A Multi-Stage Framework for Multimodal Controllable Speech Synthesis

Add code
Jun 26, 2025
Viaarxiv icon

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

Add code
Jun 25, 2025
Viaarxiv icon

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Add code
Jun 24, 2025
Viaarxiv icon

Accurate, fast, cheap: Choose three. Replacing Multi-Head-Attention with Bidirectional Recurrent Attention for Long-Form ASR

Add code
Jun 24, 2025
Viaarxiv icon

Who Does What in Deep Learning? Multidimensional Game-Theoretic Attribution of Function of Neural Units

Add code
Jun 24, 2025
Viaarxiv icon

Social Hatred: Efficient Multimodal Detection of Hatemongers

Add code
Jun 24, 2025
Viaarxiv icon