Picture for Alec Solway

Alec Solway

Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models

Add code
Aug 29, 2024
Viaarxiv icon