speech


SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Add code
Feb 13, 2026
Viaarxiv icon

WISE: A Multimodal Search Engine for Visual Scenes, Audio, Objects, Faces, Speech, and Metadata

Add code
Feb 13, 2026
Viaarxiv icon

A two-step approach for speech enhancement in low-SNR scenarios using cyclostationary beamforming and DNNs

Add code
Feb 13, 2026
Viaarxiv icon

Speech to Speech Synthesis for Voice Impersonation

Add code
Feb 13, 2026
Viaarxiv icon

When Audio-LLMs Don't Listen: A Cross-Linguistic Study of Modality Arbitration

Add code
Feb 12, 2026
Viaarxiv icon

On the Sensitivity of Firing Rate-Based Federated Spiking Neural Networks to Differential Privacy

Add code
Feb 12, 2026
Viaarxiv icon

Cross-Modal Robustness Transfer (CMRT): Training Robust Speech Translation Models Using Adversarial Text

Add code
Feb 12, 2026
Viaarxiv icon

SLD-L2S: Hierarchical Subspace Latent Diffusion for High-Fidelity Lip to Speech Synthesis

Add code
Feb 12, 2026
Viaarxiv icon

Moonshine v2: Ergodic Streaming Encoder ASR for Latency-Critical Speech Applications

Add code
Feb 12, 2026
Viaarxiv icon

TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR

Add code
Feb 12, 2026
Viaarxiv icon