Picture for Daniel Hesslow

Daniel Hesslow

The Falcon Series of Open Language Models

Add code
Nov 29, 2023
Figure 1 for The Falcon Series of Open Language Models
Figure 2 for The Falcon Series of Open Language Models
Figure 3 for The Falcon Series of Open Language Models
Figure 4 for The Falcon Series of Open Language Models
Viaarxiv icon

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Add code
Jun 01, 2023
Figure 1 for The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Figure 2 for The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Figure 3 for The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Figure 4 for The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

What Language Model to Train if You Have One Million GPU Hours?

Add code
Nov 08, 2022
Figure 1 for What Language Model to Train if You Have One Million GPU Hours?
Figure 2 for What Language Model to Train if You Have One Million GPU Hours?
Figure 3 for What Language Model to Train if You Have One Million GPU Hours?
Figure 4 for What Language Model to Train if You Have One Million GPU Hours?
Viaarxiv icon

Scaling Laws Beyond Backpropagation

Add code
Oct 26, 2022
Viaarxiv icon

RITA: a Study on Scaling Up Generative Protein Sequence Models

Add code
May 11, 2022
Figure 1 for RITA: a Study on Scaling Up Generative Protein Sequence Models
Figure 2 for RITA: a Study on Scaling Up Generative Protein Sequence Models
Figure 3 for RITA: a Study on Scaling Up Generative Protein Sequence Models
Figure 4 for RITA: a Study on Scaling Up Generative Protein Sequence Models
Viaarxiv icon

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Add code
Apr 12, 2022
Figure 1 for What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Figure 2 for What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Figure 3 for What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Figure 4 for What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Viaarxiv icon

Is the Number of Trainable Parameters All That Actually Matters?

Add code
Sep 24, 2021
Figure 1 for Is the Number of Trainable Parameters All That Actually Matters?
Figure 2 for Is the Number of Trainable Parameters All That Actually Matters?
Figure 3 for Is the Number of Trainable Parameters All That Actually Matters?
Figure 4 for Is the Number of Trainable Parameters All That Actually Matters?
Viaarxiv icon

Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra

Add code
May 07, 2021
Figure 1 for Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra
Figure 2 for Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra
Viaarxiv icon

Contrastive Embeddings for Neural Architectures

Add code
Feb 08, 2021
Figure 1 for Contrastive Embeddings for Neural Architectures
Figure 2 for Contrastive Embeddings for Neural Architectures
Figure 3 for Contrastive Embeddings for Neural Architectures
Figure 4 for Contrastive Embeddings for Neural Architectures
Viaarxiv icon