Picture for Ping Nie

Ping Nie

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Add code
Apr 09, 2026
Viaarxiv icon

Watch Before You Answer: Learning from Visually Grounded Post-Training

Add code
Apr 06, 2026
Viaarxiv icon

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

Add code
Mar 29, 2026
Viaarxiv icon

SWE-Next: Scalable Real-World Software Engineering Tasks for Agents

Add code
Mar 21, 2026
Viaarxiv icon

SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding

Add code
Mar 17, 2026
Viaarxiv icon

EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning

Add code
Mar 13, 2026
Viaarxiv icon

Beyond Closed-Pool Video Retrieval: A Benchmark and Agent Framework for Real-World Video Search and Moment Localization

Add code
Feb 10, 2026
Viaarxiv icon

VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

Add code
Feb 09, 2026
Viaarxiv icon

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Add code
Feb 05, 2026
Viaarxiv icon

GraphDancer: Training LLMs to Explore and Reason over Graphs via Curriculum Reinforcement Learning

Add code
Jan 24, 2026
Viaarxiv icon