Picture for Qiushi Sun

Qiushi Sun

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Add code
Feb 03, 2026
Viaarxiv icon

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Add code
Jan 12, 2026
Viaarxiv icon

OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models

Add code
Dec 18, 2025
Figure 1 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 2 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 3 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 4 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Viaarxiv icon

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Add code
Sep 18, 2025
Figure 1 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 2 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 3 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 4 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Viaarxiv icon

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Add code
Aug 27, 2025
Viaarxiv icon

Dynamic and Generalizable Process Reward Modeling

Add code
Jul 23, 2025
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Figure 1 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 2 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 3 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 4 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Viaarxiv icon

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Add code
Apr 15, 2025
Viaarxiv icon

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Add code
Apr 11, 2025
Figure 1 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Figure 2 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Figure 3 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Figure 4 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Viaarxiv icon

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Add code
Mar 16, 2025
Figure 1 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Figure 2 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Figure 3 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Figure 4 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Viaarxiv icon