Picture for Kaifeng Lyu

Kaifeng Lyu

AI-Assisted Generation of Difficult Math Questions

Add code
Jul 30, 2024
Figure 1 for AI-Assisted Generation of Difficult Math Questions
Figure 2 for AI-Assisted Generation of Difficult Math Questions
Figure 3 for AI-Assisted Generation of Difficult Math Questions
Figure 4 for AI-Assisted Generation of Difficult Math Questions
Viaarxiv icon

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Add code
Jun 10, 2024
Figure 1 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Figure 2 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Figure 3 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Figure 4 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Viaarxiv icon

RNNs are not Transformers : The Key Bottleneck on In-context Retrieval

Add code
Feb 29, 2024
Viaarxiv icon

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

Add code
Feb 28, 2024
Viaarxiv icon

Efficient Stagewise Pretraining via Progressive Subnetworks

Add code
Feb 08, 2024
Viaarxiv icon

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

Add code
Nov 30, 2023
Viaarxiv icon

A Quadratic Synchronization Rule for Distributed Deep Learning

Add code
Oct 22, 2023
Figure 1 for A Quadratic Synchronization Rule for Distributed Deep Learning
Figure 2 for A Quadratic Synchronization Rule for Distributed Deep Learning
Figure 3 for A Quadratic Synchronization Rule for Distributed Deep Learning
Figure 4 for A Quadratic Synchronization Rule for Distributed Deep Learning
Viaarxiv icon

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

Add code
Oct 12, 2023
Figure 1 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Figure 2 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Figure 3 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Figure 4 for DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Viaarxiv icon

The Marginal Value of Momentum for Small Learning Rate SGD

Add code
Jul 27, 2023
Figure 1 for The Marginal Value of Momentum for Small Learning Rate SGD
Figure 2 for The Marginal Value of Momentum for Small Learning Rate SGD
Figure 3 for The Marginal Value of Momentum for Small Learning Rate SGD
Viaarxiv icon

Why does Local SGD Generalize Better than SGD?

Add code
Mar 09, 2023
Figure 1 for Why  does Local SGD Generalize Better than SGD?
Figure 2 for Why  does Local SGD Generalize Better than SGD?
Figure 3 for Why  does Local SGD Generalize Better than SGD?
Figure 4 for Why  does Local SGD Generalize Better than SGD?
Viaarxiv icon