Picture for Zhengyang Tang

Zhengyang Tang

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

Add code
Jun 16, 2026
Viaarxiv icon

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Add code
Jun 12, 2026
Viaarxiv icon

PhoneWorld: Scaling Phone-Use Agent Environments

Add code
May 28, 2026
Viaarxiv icon

The Missing Piece in Pre-trained Model Evaluation: Reward-Guided Decoding Unlocks Task-Oriented Behavior Without Parameter Updates

Add code
May 27, 2026
Viaarxiv icon

Do Phone-Use Agents Respect Your Privacy?

Add code
Apr 02, 2026
Viaarxiv icon

Kimi K2.5: Visual Agentic Intelligence

Add code
Feb 02, 2026
Viaarxiv icon

CoRT: Code-integrated Reasoning within Thinking

Add code
Jun 12, 2025
Viaarxiv icon

Learning from Peers in Reasoning Models

Add code
May 12, 2025
Viaarxiv icon

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Add code
Jan 24, 2025
Figure 1 for RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
Figure 2 for RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
Figure 3 for RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
Figure 4 for RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
Viaarxiv icon

Enabling Scalable Oversight via Self-Evolving Critic

Add code
Jan 10, 2025
Viaarxiv icon