Picture for Jixuan Chen

Jixuan Chen

DeliveryBench: Can Agents Earn Profit in Real World?

Add code
Dec 22, 2025
Figure 1 for DeliveryBench: Can Agents Earn Profit in Real World?
Figure 2 for DeliveryBench: Can Agents Earn Profit in Real World?
Figure 3 for DeliveryBench: Can Agents Earn Profit in Real World?
Figure 4 for DeliveryBench: Can Agents Earn Profit in Real World?
Viaarxiv icon

OpenCUA: Open Foundations for Computer-Use Agents

Add code
Aug 12, 2025
Viaarxiv icon

OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Add code
Jun 12, 2025
Figure 1 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Figure 2 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Figure 3 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Figure 4 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Viaarxiv icon

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Add code
May 19, 2025
Viaarxiv icon

Wan: Open and Advanced Large-Scale Video Generative Models

Add code
Mar 26, 2025
Figure 1 for Wan: Open and Advanced Large-Scale Video Generative Models
Figure 2 for Wan: Open and Advanced Large-Scale Video Generative Models
Figure 3 for Wan: Open and Advanced Large-Scale Video Generative Models
Figure 4 for Wan: Open and Advanced Large-Scale Video Generative Models
Viaarxiv icon

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs

Add code
Feb 19, 2025
Figure 1 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 2 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 3 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 4 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Viaarxiv icon

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Add code
Nov 12, 2024
Figure 1 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 2 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 3 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 4 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Viaarxiv icon

COMMA: A Communicative Multimodal Multi-Agent Benchmark

Add code
Oct 10, 2024
Figure 1 for COMMA: A Communicative Multimodal Multi-Agent Benchmark
Figure 2 for COMMA: A Communicative Multimodal Multi-Agent Benchmark
Figure 3 for COMMA: A Communicative Multimodal Multi-Agent Benchmark
Figure 4 for COMMA: A Communicative Multimodal Multi-Agent Benchmark
Viaarxiv icon

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Add code
Jul 15, 2024
Figure 1 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 2 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 3 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 4 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Viaarxiv icon

BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

Add code
Jul 03, 2024
Figure 1 for BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations
Figure 2 for BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations
Figure 3 for BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations
Figure 4 for BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations
Viaarxiv icon