Picture for Kaixin Ma

Kaixin Ma

WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model

Add code
Apr 23, 2025
Viaarxiv icon

Enhancing Web Agents with Explicit Rollback Mechanisms

Add code
Apr 16, 2025
Viaarxiv icon

OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization

Add code
Oct 25, 2024
Figure 1 for OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Figure 2 for OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Figure 3 for OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Figure 4 for OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Viaarxiv icon

DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects

Add code
Oct 03, 2024
Figure 1 for DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects
Figure 2 for DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects
Figure 3 for DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects
Figure 4 for DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects
Viaarxiv icon

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Add code
Oct 02, 2024
Viaarxiv icon

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots

Add code
Sep 16, 2024
Figure 1 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Figure 2 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Figure 3 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Figure 4 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Viaarxiv icon

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Add code
Sep 12, 2024
Figure 1 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Figure 2 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Figure 3 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Figure 4 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Viaarxiv icon

COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes

Add code
Sep 06, 2024
Figure 1 for COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes
Figure 2 for COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes
Figure 3 for COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes
Figure 4 for COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes
Viaarxiv icon

DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

Add code
Jul 15, 2024
Figure 1 for DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
Figure 2 for DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
Figure 3 for DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
Figure 4 for DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
Viaarxiv icon

MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

Add code
Apr 24, 2024
Viaarxiv icon