Picture for Bingxiang He

Bingxiang He

May

Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization

Add code
Apr 14, 2026
Viaarxiv icon

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Add code
Apr 14, 2026
Viaarxiv icon

How Far Can Unsupervised RLVR Scale LLM Training?

Add code
Mar 09, 2026
Viaarxiv icon

CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning

Add code
Feb 03, 2026
Viaarxiv icon

Current Agents Fail to Leverage World Model as Tool for Foresight

Add code
Jan 08, 2026
Viaarxiv icon

CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning Under Partial Observations

Add code
Dec 30, 2025
Viaarxiv icon

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Add code
Dec 18, 2025
Viaarxiv icon

Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning

Add code
Oct 02, 2025
Figure 1 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Figure 2 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Figure 3 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Figure 4 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Viaarxiv icon

A Survey of Reinforcement Learning for Large Reasoning Models

Add code
Sep 10, 2025
Viaarxiv icon

MiniCPM4: Ultra-Efficient LLMs on End Devices

Add code
Jun 09, 2025
Figure 1 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 2 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 3 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 4 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Viaarxiv icon