Picture for Anton Lozhkov

Anton Lozhkov

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Add code
Jun 25, 2024
Viaarxiv icon

StarCoder 2 and The Stack v2: The Next Generation

Add code
Feb 29, 2024
Viaarxiv icon

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

Add code
Dec 29, 2023
Viaarxiv icon

OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

Add code
Jun 21, 2023
Figure 1 for OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Figure 2 for OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Figure 3 for OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Figure 4 for OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Viaarxiv icon

XTREME-S: Evaluating Cross-lingual Speech Representations

Add code
Apr 13, 2022
Figure 1 for XTREME-S: Evaluating Cross-lingual Speech Representations
Figure 2 for XTREME-S: Evaluating Cross-lingual Speech Representations
Figure 3 for XTREME-S: Evaluating Cross-lingual Speech Representations
Figure 4 for XTREME-S: Evaluating Cross-lingual Speech Representations
Viaarxiv icon