Picture for Luca Soldaini

Luca Soldaini

Amazon Alexa Search

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Add code
Jun 05, 2025
Viaarxiv icon

Teaching Models to Understand (but not Generate) High-risk Data

Add code
May 05, 2025
Viaarxiv icon

DataDecide: How to Predict Best Pretraining Data with Small Experiments

Add code
Apr 15, 2025
Viaarxiv icon

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Add code
Apr 09, 2025
Viaarxiv icon

Automatic Detection of Research Values from Scientific Abstracts Across Computer Science Subfields

Add code
Feb 26, 2025
Viaarxiv icon

olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

Add code
Feb 25, 2025
Viaarxiv icon

mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval

Add code
Jan 31, 2025
Figure 1 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Figure 2 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Figure 3 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Figure 4 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Viaarxiv icon

DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images

Add code
Jan 24, 2025
Figure 1 for DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images
Figure 2 for DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images
Figure 3 for DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images
Figure 4 for DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images
Viaarxiv icon

2 OLMo 2 Furious

Add code
Dec 31, 2024
Figure 1 for 2 OLMo 2 Furious
Figure 2 for 2 OLMo 2 Furious
Figure 3 for 2 OLMo 2 Furious
Figure 4 for 2 OLMo 2 Furious
Viaarxiv icon

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Add code
Dec 05, 2024
Viaarxiv icon