Picture for Xiangyu Zhang

Xiangyu Zhang

When the Specification Emerges: Benchmarking Faithfulness Loss in Long-Horizon Coding Agents

Add code
Mar 17, 2026
Viaarxiv icon

WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics

Add code
Mar 11, 2026
Viaarxiv icon

MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interatomic Potentials

Add code
Mar 05, 2026
Viaarxiv icon

DM0: An Embodied-Native Vision-Language-Action Model towards Physical AI

Add code
Feb 16, 2026
Viaarxiv icon

PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering

Add code
Feb 12, 2026
Viaarxiv icon

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Add code
Feb 11, 2026
Viaarxiv icon

GEBench: Benchmarking Image Generation Models as GUI Environments

Add code
Feb 09, 2026
Viaarxiv icon

R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging

Add code
Feb 06, 2026
Viaarxiv icon

DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder

Add code
Jan 31, 2026
Viaarxiv icon

UniMorphGrasp: Diffusion Model with Morphology-Awareness for Cross-Embodiment Dexterous Grasp Generation

Add code
Jan 31, 2026
Viaarxiv icon