speech


VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency

Add code
Sep 19, 2025
Viaarxiv icon

Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain

Add code
Sep 19, 2025
Figure 1 for Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain
Figure 2 for Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain
Figure 3 for Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain
Figure 4 for Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain
Viaarxiv icon

Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?

Add code
Sep 19, 2025
Viaarxiv icon

Are Multimodal Foundation Models All That Is Needed for Emofake Detection?

Add code
Sep 19, 2025
Viaarxiv icon

A Steered Response Power Method for Sound Source Localization With Generic Acoustic Models

Add code
Sep 19, 2025
Viaarxiv icon

Direct Simultaneous Translation Activation for Large Audio-Language Models

Add code
Sep 19, 2025
Figure 1 for Direct Simultaneous Translation Activation for Large Audio-Language Models
Figure 2 for Direct Simultaneous Translation Activation for Large Audio-Language Models
Figure 3 for Direct Simultaneous Translation Activation for Large Audio-Language Models
Figure 4 for Direct Simultaneous Translation Activation for Large Audio-Language Models
Viaarxiv icon

EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition

Add code
Sep 19, 2025
Viaarxiv icon

State-of-the-Art Dysarthric Speech Recognition with MetaICL for on-the-fly Personalization

Add code
Sep 19, 2025
Viaarxiv icon

GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition

Add code
Sep 19, 2025
Viaarxiv icon

Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization

Add code
Sep 19, 2025
Viaarxiv icon