Picture for Bingxiang He

Bingxiang He

May

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

Add code
Jun 23, 2026
Viaarxiv icon

MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis

Add code
Jun 11, 2026
Viaarxiv icon

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Add code
Apr 14, 2026
Viaarxiv icon

Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization

Add code
Apr 14, 2026
Viaarxiv icon

How Far Can Unsupervised RLVR Scale LLM Training?

Add code
Mar 09, 2026
Viaarxiv icon

CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning

Add code
Feb 03, 2026
Viaarxiv icon

Current Agents Fail to Leverage World Model as Tool for Foresight

Add code
Jan 08, 2026
Viaarxiv icon

CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning Under Partial Observations

Add code
Dec 30, 2025
Viaarxiv icon

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Add code
Dec 18, 2025
Viaarxiv icon

Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning

Add code
Oct 02, 2025
Figure 1 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Figure 2 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Figure 3 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Figure 4 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Viaarxiv icon