Picture for Yaocheng Zhang

Yaocheng Zhang

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data

Add code
Apr 15, 2026
Viaarxiv icon

Dynamic Dual-Granularity Skill Bank for Agentic RL

Add code
Mar 30, 2026
Viaarxiv icon

CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic

Add code
Nov 15, 2025
Viaarxiv icon

In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning

Add code
Dec 12, 2024
Figure 1 for In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning
Figure 2 for In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning
Figure 3 for In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning
Figure 4 for In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning
Viaarxiv icon