Picture for Boxuan Li

Boxuan Li

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Add code
Jan 17, 2026
Viaarxiv icon

Enhancing Contrastive Learning for Retinal Imaging via Adjusted Augmentation Scales

Add code
Jan 05, 2025
Figure 1 for Enhancing Contrastive Learning for Retinal Imaging via Adjusted Augmentation Scales
Figure 2 for Enhancing Contrastive Learning for Retinal Imaging via Adjusted Augmentation Scales
Figure 3 for Enhancing Contrastive Learning for Retinal Imaging via Adjusted Augmentation Scales
Figure 4 for Enhancing Contrastive Learning for Retinal Imaging via Adjusted Augmentation Scales
Viaarxiv icon

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Add code
Dec 18, 2024
Viaarxiv icon

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Add code
Jul 23, 2024
Figure 1 for OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Figure 2 for OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Figure 3 for OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Figure 4 for OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Viaarxiv icon

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Add code
Jun 12, 2024
Figure 1 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 2 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 3 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 4 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Viaarxiv icon

FasterRisk: Fast and Accurate Interpretable Risk Scores

Add code
Oct 12, 2022
Figure 1 for FasterRisk: Fast and Accurate Interpretable Risk Scores
Figure 2 for FasterRisk: Fast and Accurate Interpretable Risk Scores
Figure 3 for FasterRisk: Fast and Accurate Interpretable Risk Scores
Figure 4 for FasterRisk: Fast and Accurate Interpretable Risk Scores
Viaarxiv icon