Speaker


Robust Pitch Estimation and Tracking for Speakers Based on Subband Encoding and the Generalized Labeled Multi-Bernoulli Filter

Add code
Apr 02, 2026
Viaarxiv icon

FLEURS-Kobani: Extending the FLEURS Dataset for Northern Kurdish

Add code
Mar 31, 2026
Viaarxiv icon

Convergent Representations of Linguistic Constructions in Human and Artificial Neural Systems

Add code
Mar 31, 2026
Viaarxiv icon

LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space

Add code
Mar 31, 2026
Viaarxiv icon

Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

Add code
Mar 31, 2026
Viaarxiv icon

Towards Empowering Consumers through Sentence-level Readability Scoring in German ESG Reports

Add code
Mar 31, 2026
Viaarxiv icon

Asymmetric Encoder-Decoder Based on Time-Frequency Correlation for Speech Separation

Add code
Mar 31, 2026
Viaarxiv icon

Combining Masked Language Modeling and Cross-Modal Contrastive Learning for Prosody-Aware TTS

Add code
Mar 31, 2026
Viaarxiv icon

Transcription and Recognition of Italian Parliamentary Speeches Using Vision-Language Models

Add code
Mar 30, 2026
Viaarxiv icon

MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions

Add code
Mar 30, 2026
Viaarxiv icon