Picture for Thomas Coste

Thomas Coste

Bayesian Reward Models for LLM Alignment

Add code
Feb 20, 2024
Viaarxiv icon

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

Add code
Dec 22, 2023
Figure 1 for Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Figure 2 for Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Figure 3 for Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Figure 4 for Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Viaarxiv icon

Reward Model Ensembles Help Mitigate Overoptimization

Add code
Oct 04, 2023
Viaarxiv icon