Picture for Quentin Anthony

Quentin Anthony

DK

Zyda: A 1.3T Dataset for Open Language Modeling

Add code
Jun 04, 2024
Viaarxiv icon

Zamba: A Compact 7B SSM Hybrid Model

Add code
May 26, 2024
Viaarxiv icon

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Add code
Apr 10, 2024
Viaarxiv icon

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Add code
Mar 26, 2024
Figure 1 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 2 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 3 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 4 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Viaarxiv icon

BlackMamba: Mixture of Experts for State-Space Models

Add code
Feb 01, 2024
Viaarxiv icon

The Case for Co-Designing Model Architectures with Hardware

Add code
Jan 30, 2024
Viaarxiv icon

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

Add code
Jan 17, 2024
Viaarxiv icon

Continual Pre-Training of Large Language Models: How to (re)warm your model?

Add code
Aug 08, 2023
Figure 1 for Continual Pre-Training of Large Language Models: How to (re)warm your model?
Figure 2 for Continual Pre-Training of Large Language Models: How to (re)warm your model?
Figure 3 for Continual Pre-Training of Large Language Models: How to (re)warm your model?
Figure 4 for Continual Pre-Training of Large Language Models: How to (re)warm your model?
Viaarxiv icon

RWKV: Reinventing RNNs for the Transformer Era

Add code
May 22, 2023
Figure 1 for RWKV: Reinventing RNNs for the Transformer Era
Figure 2 for RWKV: Reinventing RNNs for the Transformer Era
Figure 3 for RWKV: Reinventing RNNs for the Transformer Era
Figure 4 for RWKV: Reinventing RNNs for the Transformer Era
Viaarxiv icon

Emergent and Predictable Memorization in Large Language Models

Add code
Apr 21, 2023
Figure 1 for Emergent and Predictable Memorization in Large Language Models
Figure 2 for Emergent and Predictable Memorization in Large Language Models
Figure 3 for Emergent and Predictable Memorization in Large Language Models
Figure 4 for Emergent and Predictable Memorization in Large Language Models
Viaarxiv icon