speech


Optimality of FSQ Tokens for Continuous Diffusion for Categorical Data with Application to Text-to-Speech

Add code
Jun 08, 2026
Viaarxiv icon

Toward Signing Activity Projection in Sign Language Interaction

Add code
Jun 08, 2026
Viaarxiv icon

TeamHerald@CHIPSAL 2026: Hate Speech Detection and Sentiment Analysis of Nepali Memes using Transformer-based Architectures and Ensemble Learning

Add code
Jun 07, 2026
Viaarxiv icon

Speaker-Invariant Representation Learning for Spoofing Detection via Gradient Reversal and A Variational Information Bottleneck

Add code
Jun 07, 2026
Viaarxiv icon

Titans-as-a-Layer: Test-Time Memory for Conversational Speech Emotion Recognition

Add code
Jun 07, 2026
Viaarxiv icon

From A to B to A: Palindromic Zero-Shot Voice Conversion with Non-Parallel Data

Add code
Jun 07, 2026
Viaarxiv icon

G-MaP-SE: Guided Speech Enhancement via GMM-Based Prior Matching

Add code
Jun 07, 2026
Viaarxiv icon

Fast and Robust On-Device Speaker Diarization: Relative Minimum Cluster Size for Stride-Accelerated Pipelines

Add code
Jun 07, 2026
Viaarxiv icon

HydraQE: OSU's Submission for the IWSLT 2026 Speech Translation Metrics Shared Task

Add code
Jun 07, 2026
Viaarxiv icon

TRADE: Transducer-Augmented Decoder for Speech LLM

Add code
Jun 07, 2026
Viaarxiv icon