Picture for Yanqi Luo

Yanqi Luo

FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration

Add code
Oct 06, 2025
Figure 1 for FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
Figure 2 for FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
Figure 3 for FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
Figure 4 for FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
Viaarxiv icon

WALT: Web Agents that Learn Tools

Add code
Oct 01, 2025
Viaarxiv icon

SCUBA: Salesforce Computer Use Benchmark

Add code
Sep 30, 2025
Viaarxiv icon

Diversity Enhances an LLM's Performance in RAG and Long-context Task

Add code
Feb 13, 2025
Viaarxiv icon