Picture for Florian Strub

Florian Strub

TSP, IP Paris, SAMOVAR

Averaging log-likelihoods in direct alignment

Add code
Jun 27, 2024
Viaarxiv icon

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Add code
Jun 27, 2024
Viaarxiv icon

Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

Add code
Apr 30, 2024
Viaarxiv icon

Language Evolution with Deep Learning

Add code
Mar 18, 2024
Viaarxiv icon

Language Model Alignment with Elastic Reset

Add code
Dec 06, 2023
Viaarxiv icon

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

Add code
Feb 09, 2023
Viaarxiv icon

SemPPL: Predicting pseudo-labels for better contrastive representations

Add code
Jan 12, 2023
Viaarxiv icon

Over-communicate no more: Situated RL agents learn concise communication protocols

Add code
Nov 02, 2022
Viaarxiv icon

Emergent Communication: Generalization and Overfitting in Lewis Games

Add code
Sep 30, 2022
Figure 1 for Emergent Communication: Generalization and Overfitting in Lewis Games
Figure 2 for Emergent Communication: Generalization and Overfitting in Lewis Games
Figure 3 for Emergent Communication: Generalization and Overfitting in Lewis Games
Figure 4 for Emergent Communication: Generalization and Overfitting in Lewis Games
Viaarxiv icon

Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

Add code
Sep 22, 2022
Viaarxiv icon