Picture for Motoki Omura

Motoki Omura

Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning

Add code
Mar 03, 2026
Viaarxiv icon

Resource-Efficient Model-Free Reinforcement Learning for Board Games

Add code
Feb 11, 2026
Viaarxiv icon

Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning

Add code
Jun 06, 2025
Viaarxiv icon

Entropy Controllable Direct Preference Optimization

Add code
Nov 12, 2024
Figure 1 for Entropy Controllable Direct Preference Optimization
Figure 2 for Entropy Controllable Direct Preference Optimization
Figure 3 for Entropy Controllable Direct Preference Optimization
Figure 4 for Entropy Controllable Direct Preference Optimization
Viaarxiv icon

Stabilizing Extreme Q-learning by Maclaurin Expansion

Add code
Jun 07, 2024
Figure 1 for Stabilizing Extreme Q-learning by Maclaurin Expansion
Figure 2 for Stabilizing Extreme Q-learning by Maclaurin Expansion
Figure 3 for Stabilizing Extreme Q-learning by Maclaurin Expansion
Figure 4 for Stabilizing Extreme Q-learning by Maclaurin Expansion
Viaarxiv icon

Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

Add code
Mar 12, 2024
Figure 1 for Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning
Figure 2 for Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning
Figure 3 for Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning
Figure 4 for Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning
Viaarxiv icon