speech


Depression diagnosis from patient interviews using multimodal machine learning

Add code
Aug 26, 2025
Viaarxiv icon

A Framework for Robust Speaker Verification in Highly Noisy Environments Leveraging Both Noisy and Enhanced Audio

Add code
Aug 26, 2025
Viaarxiv icon

SegReConcat: A Data Augmentation Method for Voice Anonymization Attack

Add code
Aug 26, 2025
Viaarxiv icon

On the Application of Diffusion Models for Simultaneous Denoising and Dereverberation

Add code
Aug 26, 2025
Viaarxiv icon

Audio-Visual Feature Synchronization for Robust Speech Enhancement in Hearing Aids

Add code
Aug 26, 2025
Viaarxiv icon

Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System

Add code
Aug 26, 2025
Viaarxiv icon

Interpolating Speaker Identities in Embedding Space for Data Expansion

Add code
Aug 26, 2025
Viaarxiv icon

Improving Noise Robust Audio-Visual Speech Recognition via Router-Gated Cross-Modal Feature Fusion

Add code
Aug 26, 2025
Viaarxiv icon

Emotion Omni: Enabling Empathetic Speech Response Generation through Large Language Models

Add code
Aug 26, 2025
Viaarxiv icon

MDD: a Mask Diffusion Detector to Protect Speaker Verification Systems from Adversarial Perturbations

Add code
Aug 26, 2025
Viaarxiv icon