Picture for Olivier Pietquin

Olivier Pietquin

Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

Add code
Apr 30, 2024
Figure 1 for Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Figure 2 for Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Figure 3 for Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Figure 4 for Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Viaarxiv icon

Language Evolution with Deep Learning

Add code
Mar 18, 2024
Figure 1 for Language Evolution with Deep Learning
Figure 2 for Language Evolution with Deep Learning
Figure 3 for Language Evolution with Deep Learning
Figure 4 for Language Evolution with Deep Learning
Viaarxiv icon

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

Add code
Mar 06, 2024
Viaarxiv icon

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Add code
Feb 26, 2024
Viaarxiv icon

MusicRL: Aligning Music Generation to Human Preferences

Add code
Feb 06, 2024
Viaarxiv icon

Learning Discrete-Time Major-Minor Mean Field Games

Add code
Dec 17, 2023
Figure 1 for Learning Discrete-Time Major-Minor Mean Field Games
Figure 2 for Learning Discrete-Time Major-Minor Mean Field Games
Figure 3 for Learning Discrete-Time Major-Minor Mean Field Games
Figure 4 for Learning Discrete-Time Major-Minor Mean Field Games
Viaarxiv icon

On Imitation in Mean-field Games

Add code
Jun 26, 2023
Figure 1 for On Imitation in Mean-field Games
Figure 2 for On Imitation in Mean-field Games
Viaarxiv icon

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Add code
May 31, 2023
Figure 1 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 2 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 3 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 4 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Viaarxiv icon

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Add code
May 22, 2023
Viaarxiv icon

Get Back Here: Robust Imitation by Return-to-Distribution Planning

Add code
May 02, 2023
Viaarxiv icon