speech


LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation

Add code
Mar 18, 2026
Viaarxiv icon

AURORA Model of Formant-to-Tongue Inversion for Didactic and Clinical Applications

Add code
Mar 18, 2026
Viaarxiv icon

Beyond bouba/kiki: Multidimensional semantic signals are deeply woven into the fabric of natural language

Add code
Mar 18, 2026
Viaarxiv icon

Impact of automatic speech recognition quality on Alzheimer's disease detection from spontaneous speech: a reproducible benchmark study with lexical modeling and statistical validation

Add code
Mar 18, 2026
Viaarxiv icon

The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

Add code
Mar 18, 2026
Viaarxiv icon

Modeling Overlapped Speech with Shuffles

Add code
Mar 18, 2026
Viaarxiv icon

Neuron-Level Emotion Control in Speech-Generative Large Audio-Language Models

Add code
Mar 18, 2026
Viaarxiv icon

Linearized Bregman Iterations for Sparse Spiking Neural Networks

Add code
Mar 17, 2026
Viaarxiv icon

Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation

Add code
Mar 17, 2026
Viaarxiv icon

Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech

Add code
Mar 17, 2026
Viaarxiv icon