Picture for Yuanheng Zhu

Yuanheng Zhu

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data

Add code
Apr 15, 2026
Viaarxiv icon

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Add code
Mar 26, 2026
Viaarxiv icon

CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic

Add code
Nov 15, 2025
Viaarxiv icon

ARAC: Adaptive Regularized Multi-Agent Soft Actor-Critic in Graph-Structured Adversarial Games

Add code
Nov 11, 2025
Figure 1 for ARAC: Adaptive Regularized Multi-Agent Soft Actor-Critic in Graph-Structured Adversarial Games
Figure 2 for ARAC: Adaptive Regularized Multi-Agent Soft Actor-Critic in Graph-Structured Adversarial Games
Figure 3 for ARAC: Adaptive Regularized Multi-Agent Soft Actor-Critic in Graph-Structured Adversarial Games
Figure 4 for ARAC: Adaptive Regularized Multi-Agent Soft Actor-Critic in Graph-Structured Adversarial Games
Viaarxiv icon

Empowering Multi-Robot Cooperation via Sequential World Models

Add code
Sep 16, 2025
Viaarxiv icon

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Add code
Jun 24, 2025
Viaarxiv icon

DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Add code
Jun 11, 2025
Figure 1 for DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
Figure 2 for DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
Figure 3 for DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
Figure 4 for DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
Viaarxiv icon

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

Add code
Oct 15, 2024
Viaarxiv icon

Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning

Add code
Aug 01, 2024
Viaarxiv icon

FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game

Add code
Feb 01, 2024
Viaarxiv icon