Picture for Marco Gaido

Marco Gaido

Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

Add code
Dec 19, 2025
Viaarxiv icon

The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models

Add code
Sep 30, 2025
Figure 1 for The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models
Figure 2 for The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models
Figure 3 for The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models
Figure 4 for The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models
Viaarxiv icon

The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence

Add code
May 29, 2025
Viaarxiv icon

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

Add code
May 28, 2025
Viaarxiv icon

Granary: Speech Recognition and Translation Dataset in 25 European Languages

Add code
May 19, 2025
Viaarxiv icon

NUTSHELL: A Dataset for Abstract Generation from Scientific Talks

Add code
Feb 24, 2025
Viaarxiv icon

Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison

Add code
Jan 04, 2025
Figure 1 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Figure 2 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Figure 3 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Figure 4 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Viaarxiv icon

Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection

Add code
Dec 16, 2024
Figure 1 for Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection
Figure 2 for Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection
Figure 3 for Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection
Figure 4 for Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection
Viaarxiv icon

SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation

Add code
Nov 03, 2024
Figure 1 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Figure 2 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Figure 3 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Figure 4 for SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Viaarxiv icon

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Add code
Oct 01, 2024
Figure 1 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 2 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 3 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Figure 4 for MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Viaarxiv icon