Picture for Xing Han Lù

Xing Han Lù

Grounding Computer Use Agents on Human Demonstrations

Add code
Nov 10, 2025
Viaarxiv icon

Build the web for agents, not agents for the web

Add code
Jun 12, 2025
Viaarxiv icon

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Add code
Apr 11, 2025
Figure 1 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Figure 2 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Figure 3 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Figure 4 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Viaarxiv icon

SafeArena: Evaluating the Safety of Autonomous Web Agents

Add code
Mar 06, 2025
Figure 1 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Figure 2 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Figure 3 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Figure 4 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Viaarxiv icon

MMTEB: Massive Multilingual Text Embedding Benchmark

Add code
Feb 19, 2025
Viaarxiv icon

The BrowserGym Ecosystem for Web Agent Research

Add code
Dec 10, 2024
Figure 1 for The BrowserGym Ecosystem for Web Agent Research
Figure 2 for The BrowserGym Ecosystem for Web Agent Research
Figure 3 for The BrowserGym Ecosystem for Web Agent Research
Figure 4 for The BrowserGym Ecosystem for Web Agent Research
Viaarxiv icon

BM25S: Orders of magnitude faster lexical search via eager sparse scoring

Add code
Jul 04, 2024
Viaarxiv icon

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Add code
Feb 08, 2024
Viaarxiv icon