Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Chrome logoAdd to Chrome

Firefox logoAdd to Firefox

Edge logoAdd to Edge

CatalyzeX Icon
Search Icon

Code for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Add code button
avatar
Github Iconbigscience-workshop/data-preparation/blob/main/preprocessing/training/clean.py

avatar
Github Iconbigscience-workshop/data_tooling/wiki/datasets-hackathon

Human or AI?

Log in/sign up for free to see all code implementations