Alert button

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Add code
Bookmark button
Alert button
Dec 31, 2020
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, Connor Leahy

Figure 1 for The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Figure 2 for The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Figure 3 for The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Figure 4 for The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: