Alert button
Picture for Quentin Anthony

Quentin Anthony

Alert button

DK

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Add code
Bookmark button
Alert button
Apr 10, 2024
Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu

Viaarxiv icon

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Add code
Bookmark button
Alert button
Mar 26, 2024
Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

Figure 1 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 2 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 3 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 4 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Viaarxiv icon

BlackMamba: Mixture of Experts for State-Space Models

Add code
Bookmark button
Alert button
Feb 01, 2024
Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge

Viaarxiv icon

The Case for Co-Designing Model Architectures with Hardware

Add code
Bookmark button
Alert button
Jan 30, 2024
Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda

Viaarxiv icon

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

Add code
Bookmark button
Alert button
Jan 17, 2024
Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Viaarxiv icon

Continual Pre-Training of Large Language Models: How to (re)warm your model?

Add code
Bookmark button
Alert button
Aug 08, 2023
Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

Figure 1 for Continual Pre-Training of Large Language Models: How to (re)warm your model?
Figure 2 for Continual Pre-Training of Large Language Models: How to (re)warm your model?
Figure 3 for Continual Pre-Training of Large Language Models: How to (re)warm your model?
Figure 4 for Continual Pre-Training of Large Language Models: How to (re)warm your model?
Viaarxiv icon

RWKV: Reinventing RNNs for the Transformer Era

Add code
Bookmark button
Alert button
May 22, 2023
Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Xiangru Tang, Bolun Wang, Johan S. Wind, Stansilaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu

Figure 1 for RWKV: Reinventing RNNs for the Transformer Era
Figure 2 for RWKV: Reinventing RNNs for the Transformer Era
Figure 3 for RWKV: Reinventing RNNs for the Transformer Era
Figure 4 for RWKV: Reinventing RNNs for the Transformer Era
Viaarxiv icon

Emergent and Predictable Memorization in Large Language Models

Add code
Bookmark button
Alert button
Apr 21, 2023
Stella Biderman, USVSN Sai Prashanth, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Shivanshu Purohit, Edward Raf

Figure 1 for Emergent and Predictable Memorization in Large Language Models
Figure 2 for Emergent and Predictable Memorization in Large Language Models
Figure 3 for Emergent and Predictable Memorization in Large Language Models
Figure 4 for Emergent and Predictable Memorization in Large Language Models
Viaarxiv icon

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Add code
Bookmark button
Alert button
Apr 03, 2023
Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal

Figure 1 for Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Figure 2 for Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Figure 3 for Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Figure 4 for Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Viaarxiv icon