Picture for Songwu Lu

Songwu Lu

Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks

Add code
Jun 16, 2025
Viaarxiv icon

RLTHF: Targeted Human Feedback for LLM Alignment

Add code
Feb 19, 2025
Viaarxiv icon