Picture for Nino Vieillard

Nino Vieillard

WARP: On the Benefits of Weight Averaged Rewarded Policies

Add code
Jun 24, 2024
Viaarxiv icon

WARM: On the Benefits of Weight Averaged Reward Models

Add code
Jan 22, 2024
Figure 1 for WARM: On the Benefits of Weight Averaged Reward Models
Figure 2 for WARM: On the Benefits of Weight Averaged Reward Models
Figure 3 for WARM: On the Benefits of Weight Averaged Reward Models
Figure 4 for WARM: On the Benefits of Weight Averaged Reward Models
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

Add code
Jun 23, 2023
Figure 1 for GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
Figure 2 for GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
Figure 3 for GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
Figure 4 for GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
Viaarxiv icon

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Add code
May 31, 2023
Figure 1 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 2 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 3 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 4 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Viaarxiv icon

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Add code
May 22, 2023
Figure 1 for Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Figure 2 for Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Figure 3 for Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Figure 4 for Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Viaarxiv icon

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

Add code
May 27, 2022
Figure 1 for KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Figure 2 for KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Figure 3 for KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Figure 4 for KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Viaarxiv icon

Implicitly Regularized RL with Implicit Q-Values

Add code
Aug 16, 2021
Figure 1 for Implicitly Regularized RL with Implicit Q-Values
Figure 2 for Implicitly Regularized RL with Implicit Q-Values
Figure 3 for Implicitly Regularized RL with Implicit Q-Values
Figure 4 for Implicitly Regularized RL with Implicit Q-Values
Viaarxiv icon

Offline Reinforcement Learning as Anti-Exploration

Add code
Jun 11, 2021
Figure 1 for Offline Reinforcement Learning as Anti-Exploration
Figure 2 for Offline Reinforcement Learning as Anti-Exploration
Figure 3 for Offline Reinforcement Learning as Anti-Exploration
Figure 4 for Offline Reinforcement Learning as Anti-Exploration
Viaarxiv icon

Offline Reinforcement Learning with Pseudometric Learning

Add code
Mar 02, 2021
Figure 1 for Offline Reinforcement Learning with Pseudometric Learning
Figure 2 for Offline Reinforcement Learning with Pseudometric Learning
Figure 3 for Offline Reinforcement Learning with Pseudometric Learning
Figure 4 for Offline Reinforcement Learning with Pseudometric Learning
Viaarxiv icon