Picture for Olivier Pietquin

Olivier Pietquin

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Add code
Jun 27, 2024
Viaarxiv icon

Averaging log-likelihoods in direct alignment

Add code
Jun 27, 2024
Viaarxiv icon

Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

Add code
Apr 30, 2024
Viaarxiv icon

Language Evolution with Deep Learning

Add code
Mar 18, 2024
Figure 1 for Language Evolution with Deep Learning
Figure 2 for Language Evolution with Deep Learning
Figure 3 for Language Evolution with Deep Learning
Figure 4 for Language Evolution with Deep Learning
Viaarxiv icon

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

Add code
Mar 06, 2024
Figure 1 for Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Figure 2 for Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Figure 3 for Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Figure 4 for Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Viaarxiv icon

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Add code
Feb 26, 2024
Viaarxiv icon

MusicRL: Aligning Music Generation to Human Preferences

Add code
Feb 06, 2024
Viaarxiv icon

Learning Discrete-Time Major-Minor Mean Field Games

Add code
Dec 17, 2023
Figure 1 for Learning Discrete-Time Major-Minor Mean Field Games
Figure 2 for Learning Discrete-Time Major-Minor Mean Field Games
Figure 3 for Learning Discrete-Time Major-Minor Mean Field Games
Figure 4 for Learning Discrete-Time Major-Minor Mean Field Games
Viaarxiv icon

On Imitation in Mean-field Games

Add code
Jun 26, 2023
Figure 1 for On Imitation in Mean-field Games
Figure 2 for On Imitation in Mean-field Games
Viaarxiv icon

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Add code
May 31, 2023
Figure 1 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 2 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 3 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 4 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Viaarxiv icon