Picture for Weiye Si

Weiye Si

daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

Add code
Feb 02, 2026
Viaarxiv icon

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

Add code
Jan 16, 2026
Viaarxiv icon

Interaction as Intelligence Part II: Asynchronous Human-Agent Rollout for Long-Horizon Task Training

Add code
Nov 03, 2025
Viaarxiv icon

InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research

Add code
Nov 03, 2025
Figure 1 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Figure 2 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Figure 3 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Figure 4 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Viaarxiv icon