Alert button
Picture for Marvin Tom

Marvin Tom

Alert button

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Add code
Bookmark button
Alert button
Sep 20, 2023
Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Bowen Yang, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming, Chen, Robert Myers, Jacob Robert Steeves, Natalia Vassilieva, Marvin Tom, Joel Hestness

Figure 1 for BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Figure 2 for BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Figure 3 for BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Figure 4 for BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Viaarxiv icon

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

Add code
Bookmark button
Alert button
Apr 06, 2023
Nolan Dey, Gurpreet Gosal, Zhiming, Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness

Figure 1 for Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Figure 2 for Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Figure 3 for Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Figure 4 for Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Viaarxiv icon