Picture for Hongyao Tang

Hongyao Tang

RoboPIN: Grounded Embodied Reasoning via Pinned Chain-of-Thought

Add code
Jun 14, 2026
Viaarxiv icon

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Add code
Jun 09, 2026
Viaarxiv icon

Reformulate LLM Reinforcement Learning for Efficient Training under Black-box Discrepancy

Add code
Jun 09, 2026
Viaarxiv icon

The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning

Add code
Apr 02, 2026
Viaarxiv icon

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

Add code
Sep 18, 2025
Viaarxiv icon

Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model

Add code
Jul 09, 2025
Figure 1 for Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
Figure 2 for Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
Figure 3 for Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
Figure 4 for Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
Viaarxiv icon

Can We Optimize Deep RL Policy Weights as Trajectory Modeling?

Add code
Mar 06, 2025
Figure 1 for Can We Optimize Deep RL Policy Weights as Trajectory Modeling?
Figure 2 for Can We Optimize Deep RL Policy Weights as Trajectory Modeling?
Viaarxiv icon

Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer

Add code
Feb 04, 2025
Figure 1 for Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer
Figure 2 for Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer
Figure 3 for Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer
Figure 4 for Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer
Viaarxiv icon

Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn

Add code
Sep 07, 2024
Figure 1 for Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
Figure 2 for Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
Figure 3 for Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
Figure 4 for Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
Viaarxiv icon

MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation Models on Embodied Task Planning

Add code
Jul 06, 2024
Viaarxiv icon