Picture for Jiahe Jin

Jiahe Jin

DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Add code
May 25, 2025
Viaarxiv icon

Efficient Agent Training for Computer Use

Add code
May 20, 2025
Viaarxiv icon

Generative AI Act II: Test Time Scaling Drives Cognition Engineering

Add code
Apr 21, 2025
Viaarxiv icon

Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?

Add code
Feb 12, 2025
Viaarxiv icon

PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

Add code
Dec 23, 2024
Figure 1 for PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
Figure 2 for PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
Figure 3 for PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
Figure 4 for PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
Viaarxiv icon

BeHonest: Benchmarking Honesty of Large Language Models

Add code
Jun 19, 2024
Viaarxiv icon