Picture for Chenghao Mou

Chenghao Mou

StarCoder 2 and The Stack v2: The Next Generation

Add code
Feb 29, 2024
Viaarxiv icon

StarCoder: may the source be with you!

Add code
May 09, 2023
Figure 1 for StarCoder: may the source be with you!
Figure 2 for StarCoder: may the source be with you!
Figure 3 for StarCoder: may the source be with you!
Figure 4 for StarCoder: may the source be with you!
Viaarxiv icon

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Add code
Mar 07, 2023
Figure 1 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 2 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 3 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 4 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Viaarxiv icon

SantaCoder: don't reach for the stars!

Add code
Jan 09, 2023
Figure 1 for SantaCoder: don't reach for the stars!
Figure 2 for SantaCoder: don't reach for the stars!
Figure 3 for SantaCoder: don't reach for the stars!
Figure 4 for SantaCoder: don't reach for the stars!
Viaarxiv icon

The Stack: 3 TB of permissively licensed source code

Add code
Nov 20, 2022
Figure 1 for The Stack: 3 TB of permissively licensed source code
Figure 2 for The Stack: 3 TB of permissively licensed source code
Figure 3 for The Stack: 3 TB of permissively licensed source code
Figure 4 for The Stack: 3 TB of permissively licensed source code
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon