Picture for Mirco Ravanelli

Mirco Ravanelli

Discrete Audio Tokens: More Than a Survey!

Add code
Jun 12, 2025
Viaarxiv icon

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs

Add code
May 26, 2025
Viaarxiv icon

LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs

Add code
May 24, 2025
Viaarxiv icon

Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down

Add code
May 19, 2025
Viaarxiv icon

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks

Add code
Feb 06, 2025
Viaarxiv icon

Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?

Add code
Oct 08, 2024
Figure 1 for Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?
Figure 2 for Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?
Figure 3 for Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?
Figure 4 for Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?
Viaarxiv icon

Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming

Add code
Oct 07, 2024
Viaarxiv icon

What Are They Doing? Joint Audio-Speech Co-Reasoning

Add code
Sep 22, 2024
Figure 1 for What Are They Doing? Joint Audio-Speech Co-Reasoning
Figure 2 for What Are They Doing? Joint Audio-Speech Co-Reasoning
Figure 3 for What Are They Doing? Joint Audio-Speech Co-Reasoning
Viaarxiv icon

LMAC-TD: Producing Time Domain Explanations for Audio Classifiers

Add code
Sep 13, 2024
Figure 1 for LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
Figure 2 for LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
Figure 3 for LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
Figure 4 for LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
Viaarxiv icon

ProGRes: Prompted Generative Rescoring on ASR n-Best

Add code
Aug 30, 2024
Viaarxiv icon