Alert button
Picture for Goran Radanović

Goran Radanović

Alert button

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Add code
Bookmark button
Alert button
Mar 04, 2024
Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla

Figure 1 for Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Viaarxiv icon

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

Add code
Bookmark button
Alert button
Mar 04, 2024
Andi Nika, Debmalya Mandal, Adish Singla, Goran Radanović

Figure 1 for Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Figure 2 for Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Viaarxiv icon

Corruption Robust Offline Reinforcement Learning with Human Feedback

Add code
Bookmark button
Alert button
Feb 09, 2024
Debmalya Mandal, Andi Nika, Parameswaran Kamalaruban, Adish Singla, Goran Radanović

Viaarxiv icon