speech


Execution-State Capsules: Graph-Bound Execution-State Checkpoint and Restore for Low-Latency, Small-Batch, On-Device Physical-AI Serving

Add code
Jun 18, 2026
Viaarxiv icon

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

Add code
Jun 18, 2026
Viaarxiv icon

FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

Add code
Jun 18, 2026
Viaarxiv icon

Transcript-Free Flow-Matching Text-to-Speech via Speech Feature Conditioning

Add code
Jun 18, 2026
Viaarxiv icon

PASQA: Pitch-Accent-Focused Speech Quality Assessment Model Trained on Synthetic Speech with Accent Errors

Add code
Jun 18, 2026
Viaarxiv icon

Personalized Keyword Spotting for User-Defined Keywords Leveraging Text-Independent Speaker Verification

Add code
Jun 18, 2026
Viaarxiv icon

Interpreting Content and Speaker Characteristics in Factorised Self-Supervised Subspaces

Add code
Jun 18, 2026
Viaarxiv icon

Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations

Add code
Jun 18, 2026
Viaarxiv icon

Analyzing Language and Geographical Variation in Speech Representations Across 60 Indic Languages

Add code
Jun 18, 2026
Viaarxiv icon

PhysDrift: Bridging the Embodiment Gap in Humanoid Co-Speech Motion Generation

Add code
Jun 18, 2026
Viaarxiv icon