Picture for Chan Hee Song

Chan Hee Song

WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation

Add code
Oct 22, 2025
Viaarxiv icon

Watch and Learn: Learning to Use Computers from Online Videos

Add code
Oct 06, 2025
Viaarxiv icon

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

Add code
Jun 26, 2025
Figure 1 for Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Figure 2 for Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Figure 3 for Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Figure 4 for Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Viaarxiv icon

An Illusion of Progress? Assessing the Current State of Web Agents

Add code
Apr 02, 2025
Figure 1 for An Illusion of Progress? Assessing the Current State of Web Agents
Figure 2 for An Illusion of Progress? Assessing the Current State of Web Agents
Figure 3 for An Illusion of Progress? Assessing the Current State of Web Agents
Figure 4 for An Illusion of Progress? Assessing the Current State of Web Agents
Viaarxiv icon

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

Add code
Nov 25, 2024
Figure 1 for RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Figure 2 for RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Figure 3 for RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Figure 4 for RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Viaarxiv icon

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Add code
Aug 12, 2024
Figure 1 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 2 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 3 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 4 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Viaarxiv icon

Dual-View Visual Contextualization for Web Navigation

Add code
Feb 06, 2024
Figure 1 for Dual-View Visual Contextualization for Web Navigation
Figure 2 for Dual-View Visual Contextualization for Web Navigation
Figure 3 for Dual-View Visual Contextualization for Web Navigation
Figure 4 for Dual-View Visual Contextualization for Web Navigation
Viaarxiv icon

BioCLIP: A Vision Foundation Model for the Tree of Life

Add code
Dec 04, 2023
Figure 1 for BioCLIP: A Vision Foundation Model for the Tree of Life
Figure 2 for BioCLIP: A Vision Foundation Model for the Tree of Life
Figure 3 for BioCLIP: A Vision Foundation Model for the Tree of Life
Figure 4 for BioCLIP: A Vision Foundation Model for the Tree of Life
Viaarxiv icon

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

Add code
Dec 08, 2022
Figure 1 for LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Figure 2 for LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Figure 3 for LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Figure 4 for LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Viaarxiv icon

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

Add code
Feb 14, 2022
Figure 1 for One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Figure 2 for One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Figure 3 for One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Figure 4 for One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Viaarxiv icon