Alert button
Picture for Zhaojin Wen

Zhaojin Wen

Alert button

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

Add code
Bookmark button
Alert button
Oct 10, 2023
Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao

Viaarxiv icon