Picture for Ricard Marxer

Ricard Marxer

DYNI

Depth Jitter: Seeing through the Depth

Add code
Aug 08, 2025
Viaarxiv icon

Factorized RVQ-GAN For Disentangled Speech Tokenization

Add code
Jun 18, 2025
Viaarxiv icon

Discrete Audio Tokens: More Than a Survey!

Add code
Jun 12, 2025
Viaarxiv icon

Aligning Multimodal Representations through an Information Bottleneck

Add code
Jun 05, 2025
Viaarxiv icon

Text-Speech Language Models with Improved Cross-Modal Transfer by Aligning Abstraction Levels

Add code
Mar 08, 2025
Viaarxiv icon

TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 2024

Add code
Jul 17, 2024
Viaarxiv icon

Transfer Learning from Whisper for Microscopic Intelligibility Prediction

Add code
Apr 02, 2024
Figure 1 for Transfer Learning from Whisper for Microscopic Intelligibility Prediction
Figure 2 for Transfer Learning from Whisper for Microscopic Intelligibility Prediction
Figure 3 for Transfer Learning from Whisper for Microscopic Intelligibility Prediction
Figure 4 for Transfer Learning from Whisper for Microscopic Intelligibility Prediction
Viaarxiv icon

Scaling Properties of Speech Language Models

Add code
Mar 31, 2024
Figure 1 for Scaling Properties of Speech Language Models
Figure 2 for Scaling Properties of Speech Language Models
Figure 3 for Scaling Properties of Speech Language Models
Figure 4 for Scaling Properties of Speech Language Models
Viaarxiv icon

PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings

Add code
Mar 04, 2024
Viaarxiv icon

Speech foundation models on intelligibility prediction for hearing-impaired listeners

Add code
Jan 24, 2024
Viaarxiv icon