Picture for Nathan Grinsztajn

Nathan Grinsztajn

CRIStAL

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Add code
Jun 27, 2024
Figure 1 for Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Figure 2 for Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Figure 3 for Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Figure 4 for Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Viaarxiv icon

Averaging log-likelihoods in direct alignment

Add code
Jun 27, 2024
Figure 1 for Averaging log-likelihoods in direct alignment
Figure 2 for Averaging log-likelihoods in direct alignment
Figure 3 for Averaging log-likelihoods in direct alignment
Figure 4 for Averaging log-likelihoods in direct alignment
Viaarxiv icon

Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

Add code
Jun 24, 2024
Viaarxiv icon

Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A

Add code
Nov 29, 2023
Figure 1 for Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A
Figure 2 for Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A
Figure 3 for Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A
Figure 4 for Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A
Viaarxiv icon

Combinatorial Optimization with Policy Adaptation using Latent Space Search

Add code
Nov 13, 2023
Figure 1 for Combinatorial Optimization with Policy Adaptation using Latent Space Search
Figure 2 for Combinatorial Optimization with Policy Adaptation using Latent Space Search
Figure 3 for Combinatorial Optimization with Policy Adaptation using Latent Space Search
Figure 4 for Combinatorial Optimization with Policy Adaptation using Latent Space Search
Viaarxiv icon

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

Add code
Jun 16, 2023
Viaarxiv icon

Population-Based Reinforcement Learning for Combinatorial Optimization

Add code
Oct 07, 2022
Figure 1 for Population-Based Reinforcement Learning for Combinatorial Optimization
Figure 2 for Population-Based Reinforcement Learning for Combinatorial Optimization
Figure 3 for Population-Based Reinforcement Learning for Combinatorial Optimization
Figure 4 for Population-Based Reinforcement Learning for Combinatorial Optimization
Viaarxiv icon

Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round

Add code
Aug 04, 2022
Figure 1 for Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round
Figure 2 for Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round
Figure 3 for Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round
Figure 4 for Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round
Viaarxiv icon

More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences

Add code
Nov 07, 2021
Figure 1 for More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences
Figure 2 for More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences
Figure 3 for More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences
Figure 4 for More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences
Viaarxiv icon

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

Add code
Jun 09, 2021
Figure 1 for There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning
Figure 2 for There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning
Figure 3 for There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning
Figure 4 for There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning
Viaarxiv icon