Picture for Sara Papi

Sara Papi

SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation

Add code
Mar 11, 2026
Viaarxiv icon

Do What I Say: A Spoken Prompt Dataset for Instruction-Following

Add code
Mar 10, 2026
Viaarxiv icon

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

Add code
Dec 24, 2025
Figure 1 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Figure 2 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Figure 3 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Figure 4 for Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Viaarxiv icon

Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

Add code
Dec 19, 2025
Figure 1 for Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
Figure 2 for Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
Figure 3 for Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
Figure 4 for Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
Viaarxiv icon

The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence

Add code
May 29, 2025
Viaarxiv icon

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

Add code
May 28, 2025
Viaarxiv icon

Granary: Speech Recognition and Translation Dataset in 25 European Languages

Add code
May 19, 2025
Viaarxiv icon

NUTSHELL: A Dataset for Abstract Generation from Scientific Talks

Add code
Feb 24, 2025
Viaarxiv icon

Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison

Add code
Jan 04, 2025
Figure 1 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Figure 2 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Figure 3 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Figure 4 for Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Viaarxiv icon

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Add code
Dec 24, 2024
Figure 1 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
Figure 2 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
Figure 3 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
Figure 4 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
Viaarxiv icon