Alert button
Picture for Thomas Coste

Thomas Coste

Alert button

Bayesian Reward Models for LLM Alignment

Add code
Bookmark button
Alert button
Feb 20, 2024
Adam X. Yang, Maxime Robeyns, Thomas Coste, Jun Wang, Haitham Bou-Ammar, Laurence Aitchison

Viaarxiv icon

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

Add code
Bookmark button
Alert button
Dec 22, 2023
Filippos Christianos, Georgios Papoudakis, Matthieu Zimmer, Thomas Coste, Zhihao Wu, Jingxuan Chen, Khyati Khandelwal, James Doran, Xidong Feng, Jiacheng Liu, Zheng Xiong, Yicheng Luo, Jianye Hao, Kun Shao, Haitham Bou-Ammar, Jun Wang

Viaarxiv icon

Reward Model Ensembles Help Mitigate Overoptimization

Add code
Bookmark button
Alert button
Oct 04, 2023
Thomas Coste, Usman Anwar, Robert Kirk, David Krueger

Viaarxiv icon