Abstract:Humanoid robots require whole-body controllers that are both robust and precise in contact-rich environments. While deep reinforcement learning (RL) achieves robust stability, its behavior is tightly coupled to the training objective and command interface, making it difficult to add new feedback objectives without retraining. In this study, we propose an RL guided whole-body model predictive path integral (MPPI) framework that acts as an add-on feedback controller on top of a pretrained RL policy. Instead of using RL policy as the final controller, we use it as a sampling prior that biases MPPI rollouts toward dynamically feasible behaviors. Task objectives are specified through modular MPPI cost terms, and MPPI closes the loop by continuously correcting the RL prior online to satisfy these objectives without retraining the policy. Simulations on a 29-DoF Unitree G1 humanoid in MuJoCo demonstrate stable high-rate control (average 280~Hz). The proposed method improves task-level precision over a pure RL baseline under the same command interface. This is achieved by correcting systematic drift during straight walking and tracking additional whole-body reference signals imposed through the cost.
Abstract:Articulated object manipulation is a challenging task, requiring constrained motion and adaptive control to handle the unknown dynamics of the manipulated objects. While reinforcement learning (RL) has been widely employed to tackle various scenarios and types of articulated objects, the complexity of these tasks, stemming from multiple intertwined objectives makes learning a control policy in the full task space highly difficult. To address this issue, we propose a Subspace-wise hybrid RL (SwRL) framework that learns policies for each divided task space, or subspace, based on independent objectives. This approach enables adaptive force modulation to accommodate the unknown dynamics of objects. Additionally, it effectively leverages the previously underlooked redundant subspace, thereby maximizing the robot's dexterity. Our method enhances both learning efficiency and task execution performance, as validated through simulations and real-world experiments. Supplementary video is available at https://youtu.be/PkNxv0P8Atk