Picture for Thomas Coste

Thomas Coste

Bayesian Reward Models for LLM Alignment

Feb 20, 2024
Viaarxiv icon

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

Add code
Dec 22, 2023
Viaarxiv icon

Reward Model Ensembles Help Mitigate Overoptimization

Add code
Oct 04, 2023
Viaarxiv icon