Picture for Matthieu Geist

Matthieu Geist

INRIA Lorraine - LORIA

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Add code
Jun 27, 2024
Viaarxiv icon

Averaging log-likelihoods in direct alignment

Add code
Jun 27, 2024
Viaarxiv icon

RRLS : Robust Reinforcement Learning Suite

Add code
Jun 12, 2024
Viaarxiv icon

Time-Constrained Robust MDPs

Add code
Jun 12, 2024
Viaarxiv icon

Bootstrapping Expectiles in Reinforcement Learning

Add code
Jun 06, 2024
Figure 1 for Bootstrapping Expectiles in Reinforcement Learning
Figure 2 for Bootstrapping Expectiles in Reinforcement Learning
Figure 3 for Bootstrapping Expectiles in Reinforcement Learning
Figure 4 for Bootstrapping Expectiles in Reinforcement Learning
Viaarxiv icon

Self-Improving Robust Preference Optimization

Add code
Jun 03, 2024
Viaarxiv icon

Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space

Add code
May 02, 2024
Figure 1 for Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space
Figure 2 for Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space
Figure 3 for Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space
Figure 4 for Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space
Viaarxiv icon

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

Add code
Mar 06, 2024
Figure 1 for Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Figure 2 for Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Figure 3 for Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Figure 4 for Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Viaarxiv icon

MusicRL: Aligning Music Generation to Human Preferences

Add code
Feb 06, 2024
Viaarxiv icon

Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View

Add code
Jan 20, 2024
Viaarxiv icon