Picture for Pengrui Lu

Pengrui Lu

AcademiClaw: When Students Set Challenges for AI Agents

Add code
May 04, 2026
Viaarxiv icon

AlphaEval: Evaluating Agents in Production

Add code
Apr 14, 2026
Viaarxiv icon

Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks

Add code
Apr 03, 2026
Viaarxiv icon

daVinci-LLM:Towards the Science of Pretraining

Add code
Mar 28, 2026
Viaarxiv icon

ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development

Add code
Feb 02, 2026
Viaarxiv icon

InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research

Add code
Nov 03, 2025
Figure 1 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Figure 2 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Figure 3 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Figure 4 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Viaarxiv icon

Interaction as Intelligence Part II: Asynchronous Human-Agent Rollout for Long-Horizon Task Training

Add code
Nov 03, 2025
Viaarxiv icon

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

Add code
Apr 07, 2025
Figure 1 for DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
Figure 2 for DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
Figure 3 for DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
Figure 4 for DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
Viaarxiv icon