Picture for Luca Soldaini

Luca Soldaini

Amazon Alexa Search

Olmix: A Framework for Data Mixing Throughout LM Development

Add code
Feb 12, 2026
Viaarxiv icon

Overview of the TREC 2025 RAGTIME Track

Add code
Feb 10, 2026
Viaarxiv icon

How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs

Add code
Feb 09, 2026
Viaarxiv icon

NeuCLIRTech: Chinese Monolingual and Cross-Language Information Retrieval Evaluation in a Challenging Domain

Add code
Feb 05, 2026
Viaarxiv icon

Bolmo: Byteifying the Next Generation of Language Models

Add code
Dec 17, 2025
Figure 1 for Bolmo: Byteifying the Next Generation of Language Models
Figure 2 for Bolmo: Byteifying the Next Generation of Language Models
Figure 3 for Bolmo: Byteifying the Next Generation of Language Models
Figure 4 for Bolmo: Byteifying the Next Generation of Language Models
Viaarxiv icon

Olmo 3

Add code
Dec 15, 2025
Viaarxiv icon

olmOCR 2: Unit Test Rewards for Document OCR

Add code
Oct 22, 2025
Viaarxiv icon

Overview of the TREC 2024 NeuCLIR Track

Add code
Sep 17, 2025
Viaarxiv icon

FlexOlmo: Open Language Models for Flexible Data Use

Add code
Jul 09, 2025
Figure 1 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 2 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 3 for FlexOlmo: Open Language Models for Flexible Data Use
Figure 4 for FlexOlmo: Open Language Models for Flexible Data Use
Viaarxiv icon

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Add code
Jun 05, 2025
Viaarxiv icon