Picture for Alec Solway

Alec Solway

Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models

Add code
Aug 29, 2024
Figure 1 for Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models
Figure 2 for Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models
Viaarxiv icon