Picture for Alekh Agarwal

Alekh Agarwal

Robust Preference Optimization through Reward Model Distillation

May 29, 2024
Viaarxiv icon

Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization

Add code
Mar 28, 2024
Figure 1 for Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization
Figure 2 for Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization
Figure 3 for Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization
Figure 4 for Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization
Viaarxiv icon

Stochastic Gradient Succeeds for Bandits

Feb 27, 2024
Viaarxiv icon

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

Add code
Feb 11, 2024
Viaarxiv icon

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

Jan 08, 2024
Figure 1 for A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Figure 2 for A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Figure 3 for A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Figure 4 for A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Viaarxiv icon

Theoretical guarantees on the best-of-n alignment policy

Jan 03, 2024
Viaarxiv icon

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Dec 21, 2023
Viaarxiv icon

Efficient End-to-End Visual Document Understanding with Rationale Distillation

Nov 16, 2023
Viaarxiv icon

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

May 26, 2023
Figure 1 for A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks
Figure 2 for A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks
Figure 3 for A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks
Figure 4 for A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks
Viaarxiv icon

An Empirical Evaluation of Federated Contextual Bandit Algorithms

Add code
Mar 17, 2023
Figure 1 for An Empirical Evaluation of Federated Contextual Bandit Algorithms
Figure 2 for An Empirical Evaluation of Federated Contextual Bandit Algorithms
Figure 3 for An Empirical Evaluation of Federated Contextual Bandit Algorithms
Figure 4 for An Empirical Evaluation of Federated Contextual Bandit Algorithms
Viaarxiv icon