Picture for Luca Soldaini

Luca Soldaini

Amazon Alexa Search

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Add code
Jun 26, 2024
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Viaarxiv icon

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

Add code
Jun 10, 2024
Viaarxiv icon

On the Evaluation of Machine-Generated Reports

Add code
May 02, 2024
Viaarxiv icon

Overview of the TREC 2023 NeuCLIR Track

Add code
Apr 11, 2024
Figure 1 for Overview of the TREC 2023 NeuCLIR Track
Figure 2 for Overview of the TREC 2023 NeuCLIR Track
Figure 3 for Overview of the TREC 2023 NeuCLIR Track
Figure 4 for Overview of the TREC 2023 NeuCLIR Track
Viaarxiv icon

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Add code
Mar 22, 2024
Viaarxiv icon

KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions

Add code
Mar 06, 2024
Figure 1 for KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions
Figure 2 for KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions
Figure 3 for KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions
Figure 4 for KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions
Viaarxiv icon

OLMo: Accelerating the Science of Language Models

Add code
Feb 07, 2024
Figure 1 for OLMo: Accelerating the Science of Language Models
Figure 2 for OLMo: Accelerating the Science of Language Models
Figure 3 for OLMo: Accelerating the Science of Language Models
Figure 4 for OLMo: Accelerating the Science of Language Models
Viaarxiv icon

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Add code
Jan 31, 2024
Figure 1 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 2 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 3 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 4 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Viaarxiv icon

AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

Add code
Jan 16, 2024
Viaarxiv icon