Picture for Matthieu Geist

Matthieu Geist

INRIA Lorraine - LORIA

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Add code
May 23, 2025
Viaarxiv icon

NavBench: A Unified Robotics Benchmark for Reinforcement Learning-Based Autonomous Navigation

Add code
May 20, 2025
Viaarxiv icon

ShiQ: Bringing back Bellman to LLMs

Add code
May 16, 2025
Viaarxiv icon

Command A: An Enterprise-Ready Large Language Model

Add code
Apr 01, 2025
Viaarxiv icon

Understanding Likelihood Over-optimisation in Direct Alignment Algorithms

Add code
Oct 15, 2024
Figure 1 for Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Figure 2 for Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Figure 3 for Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Figure 4 for Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Viaarxiv icon

Solving robust MDPs as a sequence of static RL problems

Add code
Oct 08, 2024
Figure 1 for Solving robust MDPs as a sequence of static RL problems
Figure 2 for Solving robust MDPs as a sequence of static RL problems
Figure 3 for Solving robust MDPs as a sequence of static RL problems
Figure 4 for Solving robust MDPs as a sequence of static RL problems
Viaarxiv icon

Imitating Language via Scalable Inverse Reinforcement Learning

Add code
Sep 02, 2024
Figure 1 for Imitating Language via Scalable Inverse Reinforcement Learning
Figure 2 for Imitating Language via Scalable Inverse Reinforcement Learning
Figure 3 for Imitating Language via Scalable Inverse Reinforcement Learning
Figure 4 for Imitating Language via Scalable Inverse Reinforcement Learning
Viaarxiv icon

Averaging log-likelihoods in direct alignment

Add code
Jun 27, 2024
Figure 1 for Averaging log-likelihoods in direct alignment
Figure 2 for Averaging log-likelihoods in direct alignment
Figure 3 for Averaging log-likelihoods in direct alignment
Figure 4 for Averaging log-likelihoods in direct alignment
Viaarxiv icon

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Add code
Jun 27, 2024
Viaarxiv icon

Time-Constrained Robust MDPs

Add code
Jun 12, 2024
Viaarxiv icon