Picture for Rong Bao

Rong Bao

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

Add code
Jun 17, 2024
Figure 1 for Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
Figure 2 for Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
Figure 3 for Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
Figure 4 for Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
Viaarxiv icon

Mitigating Reward Hacking via Information-Theoretic Reward Modeling

Add code
Feb 16, 2024
Figure 1 for Mitigating Reward Hacking via Information-Theoretic Reward Modeling
Figure 2 for Mitigating Reward Hacking via Information-Theoretic Reward Modeling
Figure 3 for Mitigating Reward Hacking via Information-Theoretic Reward Modeling
Figure 4 for Mitigating Reward Hacking via Information-Theoretic Reward Modeling
Viaarxiv icon

Orthogonal Subspace Learning for Language Model Continual Learning

Add code
Oct 22, 2023
Figure 1 for Orthogonal Subspace Learning for Language Model Continual Learning
Figure 2 for Orthogonal Subspace Learning for Language Model Continual Learning
Figure 3 for Orthogonal Subspace Learning for Language Model Continual Learning
Figure 4 for Orthogonal Subspace Learning for Language Model Continual Learning
Viaarxiv icon

Robust Lottery Tickets for Pre-trained Language Models

Add code
Nov 06, 2022
Figure 1 for Robust Lottery Tickets for Pre-trained Language Models
Figure 2 for Robust Lottery Tickets for Pre-trained Language Models
Figure 3 for Robust Lottery Tickets for Pre-trained Language Models
Figure 4 for Robust Lottery Tickets for Pre-trained Language Models
Viaarxiv icon