Alert button
Picture for Kaifeng Lyu

Kaifeng Lyu

Alert button

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

Add code
Bookmark button
Alert button
Feb 29, 2024
Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

Viaarxiv icon

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

Add code
Bookmark button
Alert button
Feb 28, 2024
Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora

Viaarxiv icon

Efficient Stagewise Pretraining via Progressive Subnetworks

Add code
Bookmark button
Alert button
Feb 08, 2024
Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar

Viaarxiv icon

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

Add code
Bookmark button
Alert button
Nov 30, 2023
Kaifeng Lyu, Jikai Jin, Zhiyuan Li, Simon S. Du, Jason D. Lee, Wei Hu

Viaarxiv icon

A Quadratic Synchronization Rule for Distributed Deep Learning

Add code
Bookmark button
Alert button
Oct 22, 2023
Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

Viaarxiv icon

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

Add code
Bookmark button
Alert button
Oct 12, 2023
Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal

Figure 1 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Figure 2 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Figure 3 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Figure 4 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Viaarxiv icon

The Marginal Value of Momentum for Small Learning Rate SGD

Add code
Bookmark button
Alert button
Jul 27, 2023
Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li

Figure 1 for The Marginal Value of Momentum for Small Learning Rate SGD
Figure 2 for The Marginal Value of Momentum for Small Learning Rate SGD
Figure 3 for The Marginal Value of Momentum for Small Learning Rate SGD
Viaarxiv icon

Why (and When) does Local SGD Generalize Better than SGD?

Add code
Bookmark button
Alert button
Mar 09, 2023
Xinran Gu, Kaifeng Lyu, Longbo Huang, Sanjeev Arora

Figure 1 for Why (and When) does Local SGD Generalize Better than SGD?
Figure 2 for Why (and When) does Local SGD Generalize Better than SGD?
Figure 3 for Why (and When) does Local SGD Generalize Better than SGD?
Figure 4 for Why (and When) does Local SGD Generalize Better than SGD?
Viaarxiv icon

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing

Add code
Bookmark button
Alert button
Jan 27, 2023
Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon S. Du, Jason D. Lee

Figure 1 for Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
Figure 2 for Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
Viaarxiv icon