Picture for Motoki Omura

Motoki Omura

Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning

Add code
Jun 06, 2025
Viaarxiv icon

Entropy Controllable Direct Preference Optimization

Add code
Nov 12, 2024
Figure 1 for Entropy Controllable Direct Preference Optimization
Figure 2 for Entropy Controllable Direct Preference Optimization
Figure 3 for Entropy Controllable Direct Preference Optimization
Figure 4 for Entropy Controllable Direct Preference Optimization
Viaarxiv icon

Stabilizing Extreme Q-learning by Maclaurin Expansion

Add code
Jun 07, 2024
Viaarxiv icon

Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

Add code
Mar 12, 2024
Viaarxiv icon