speech


PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction

Add code
Jun 18, 2025
Viaarxiv icon

Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper

Add code
Jun 18, 2025
Viaarxiv icon

Factorized RVQ-GAN For Disentangled Speech Tokenization

Add code
Jun 18, 2025
Viaarxiv icon

EmojiVoice: Towards long-term controllable expressivity in robot speech

Add code
Jun 18, 2025
Viaarxiv icon

An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW

Add code
Jun 18, 2025
Viaarxiv icon

A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments

Add code
Jun 17, 2025
Viaarxiv icon

Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios

Add code
Jun 17, 2025
Viaarxiv icon

Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition

Add code
Jun 17, 2025
Viaarxiv icon

DETONATE: A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization

Add code
Jun 17, 2025
Viaarxiv icon

Design an Editable Speech-to-Sign-Language Transformer System: A Human-Centered AI Approach

Add code
Jun 17, 2025
Viaarxiv icon