Alert button
Picture for Kaiyue Wen

Kaiyue Wen

Alert button

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

Add code
Bookmark button
Alert button
Feb 29, 2024
Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

Viaarxiv icon

Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars

Add code
Bookmark button
Alert button
Dec 03, 2023
Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski

Viaarxiv icon

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Add code
Bookmark button
Alert button
Jul 23, 2023
Kaiyue Wen, Zhiyuan Li, Tengyu Ma

Figure 1 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Figure 2 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Figure 3 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Figure 4 for Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Viaarxiv icon

Practically Solving LPN in High Noise Regimes Faster Using Neural Networks

Add code
Bookmark button
Alert button
Mar 14, 2023
Haozhe Jiang, Kaiyue Wen, Yilei Chen

Figure 1 for Practically Solving LPN in High Noise Regimes Faster Using Neural Networks
Figure 2 for Practically Solving LPN in High Noise Regimes Faster Using Neural Networks
Figure 3 for Practically Solving LPN in High Noise Regimes Faster Using Neural Networks
Figure 4 for Practically Solving LPN in High Noise Regimes Faster Using Neural Networks
Viaarxiv icon

Finding Skill Neurons in Pre-trained Transformer-based Language Models

Add code
Bookmark button
Alert button
Nov 14, 2022
Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi Li

Figure 1 for Finding Skill Neurons in Pre-trained Transformer-based Language Models
Figure 2 for Finding Skill Neurons in Pre-trained Transformer-based Language Models
Figure 3 for Finding Skill Neurons in Pre-trained Transformer-based Language Models
Figure 4 for Finding Skill Neurons in Pre-trained Transformer-based Language Models
Viaarxiv icon

How Does Sharpness-Aware Minimization Minimize Sharpness?

Add code
Bookmark button
Alert button
Nov 10, 2022
Kaiyue Wen, Tengyu Ma, Zhiyuan Li

Figure 1 for How Does Sharpness-Aware Minimization Minimize Sharpness?
Figure 2 for How Does Sharpness-Aware Minimization Minimize Sharpness?
Viaarxiv icon

Realistic Deep Learning May Not Fit Benignly

Add code
Bookmark button
Alert button
Jun 01, 2022
Kaiyue Wen, Jiaye Teng, Jingzhao Zhang

Figure 1 for Realistic Deep Learning May Not Fit Benignly
Figure 2 for Realistic Deep Learning May Not Fit Benignly
Figure 3 for Realistic Deep Learning May Not Fit Benignly
Figure 4 for Realistic Deep Learning May Not Fit Benignly
Viaarxiv icon