Picture for Remi Munos

Remi Munos

INRIA Lille

Efficient learning by implicit exploration in bandit problems with side observations

Add code
Apr 27, 2026
Viaarxiv icon

Spectral Thompson sampling

Add code
Apr 15, 2026
Viaarxiv icon

Efficient RL Training for LLMs with Experience Replay

Add code
Apr 09, 2026
Viaarxiv icon

Automatic Textbook Formalization

Add code
Apr 03, 2026
Viaarxiv icon

Expanding the Capabilities of Reinforcement Learning via Text Feedback

Add code
Feb 02, 2026
Viaarxiv icon

Outcome-based Exploration for LLM Reasoning

Add code
Sep 08, 2025
Viaarxiv icon

Soft Policy Optimization: Online Off-Policy RL for Sequence Models

Add code
Mar 07, 2025
Figure 1 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Figure 2 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Viaarxiv icon

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Viaarxiv icon

Super-Exponential Regret for UCT, AlphaGo and Variants

Add code
May 07, 2024
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon