Picture for Hengquan Guo

Hengquan Guo

BLOCK: An Open-Source Bi-Stage MLLM Character-to-Skin Pipeline for Minecraft

Add code
Mar 04, 2026
Viaarxiv icon

Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization

Add code
Oct 25, 2024
Figure 1 for Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Figure 2 for Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Figure 3 for Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Figure 4 for Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Viaarxiv icon

Learning to Schedule Online Tasks with Bandit Feedback

Add code
Feb 26, 2024
Figure 1 for Learning to Schedule Online Tasks with Bandit Feedback
Figure 2 for Learning to Schedule Online Tasks with Bandit Feedback
Figure 3 for Learning to Schedule Online Tasks with Bandit Feedback
Figure 4 for Learning to Schedule Online Tasks with Bandit Feedback
Viaarxiv icon

Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints

Add code
Nov 29, 2022
Figure 1 for Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints
Figure 2 for Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints
Figure 3 for Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints
Figure 4 for Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints
Viaarxiv icon