speech


Users and Wizards in Conversations: How WoZ Interface Choices Define Human-Robot Interactions

Add code
Mar 30, 2026
Viaarxiv icon

On the Role of Encoder Depth: Pruning Whisper and LoRA Fine-Tuning in SLAM-ASR

Add code
Mar 30, 2026
Viaarxiv icon

An Empirical Recipe for Universal Phone Recognition

Add code
Mar 30, 2026
Viaarxiv icon

MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions

Add code
Mar 30, 2026
Viaarxiv icon

Transcription and Recognition of Italian Parliamentary Speeches Using Vision-Language Models

Add code
Mar 30, 2026
Viaarxiv icon

EBuddy: a workflow orchestrator for industrial human-machine collaboration

Add code
Mar 30, 2026
Viaarxiv icon

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

Add code
Mar 30, 2026
Viaarxiv icon

A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators

Add code
Mar 29, 2026
Viaarxiv icon

Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes

Add code
Mar 28, 2026
Viaarxiv icon

SCOPE: Tree-based Self-Correcting Online Log Parsing via Syntactic-Semantic Collaboration

Add code
Mar 28, 2026
Viaarxiv icon