Alert button
Picture for Eric Hambro

Eric Hambro

Alert button

Teaching Large Language Models to Reason with Reinforcement Learning

Mar 07, 2024
Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

Figure 1 for Teaching Large Language Models to Reason with Reinforcement Learning
Figure 2 for Teaching Large Language Models to Reason with Reinforcement Learning
Figure 3 for Teaching Large Language Models to Reason with Reinforcement Learning
Figure 4 for Teaching Large Language Models to Reason with Reinforcement Learning
Viaarxiv icon

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Feb 26, 2024
Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

Viaarxiv icon

GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

Feb 13, 2024
Alex Havrilla, Sharath Raparthy, Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Railneau

Viaarxiv icon

Generalization to New Sequential Decision Making Tasks with In-Context Learning

Dec 06, 2023
Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu

Viaarxiv icon

Understanding the Effects of RLHF on LLM Generalisation and Diversity

Oct 10, 2023
Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu

Viaarxiv icon

LLaMA: Open and Efficient Foundation Language Models

Feb 27, 2023
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample

Figure 1 for LLaMA: Open and Efficient Foundation Language Models
Figure 2 for LLaMA: Open and Efficient Foundation Language Models
Figure 3 for LLaMA: Open and Efficient Foundation Language Models
Figure 4 for LLaMA: Open and Efficient Foundation Language Models
Viaarxiv icon

Dungeons and Data: A Large-Scale NetHack Dataset

Nov 22, 2022
Eric Hambro, Roberta Raileanu, Danielle Rothermel, Vegard Mella, Tim Rocktäschel, Heinrich Küttler, Naila Murray

Figure 1 for Dungeons and Data: A Large-Scale NetHack Dataset
Figure 2 for Dungeons and Data: A Large-Scale NetHack Dataset
Figure 3 for Dungeons and Data: A Large-Scale NetHack Dataset
Figure 4 for Dungeons and Data: A Large-Scale NetHack Dataset
Viaarxiv icon

Insights From the NeurIPS 2021 NetHack Challenge

Mar 22, 2022
Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, Daejin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel, Mikayel Samvelyan, Dmitry Sorokin, Maciej Sypetkowski, Michał Sypetkowski

Figure 1 for Insights From the NeurIPS 2021 NetHack Challenge
Figure 2 for Insights From the NeurIPS 2021 NetHack Challenge
Figure 3 for Insights From the NeurIPS 2021 NetHack Challenge
Figure 4 for Insights From the NeurIPS 2021 NetHack Challenge
Viaarxiv icon

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

Sep 27, 2021
Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel

Figure 1 for MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Figure 2 for MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Figure 3 for MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Figure 4 for MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Viaarxiv icon