Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jindan Huang

WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation

Oct 22, 2025

Yaoyao Qian, Yuanli Wang, Jinda Zhang, Yun Zong, Meixu Chen, Hanhan Zhou, Jindan Huang, Yifan Zeng, Xinyu Hu, Chan Hee Song(+1 more)

Abstract:Current evaluation of web agents largely reduces to binary success metrics or conformity to a single reference trajectory, ignoring the structural diversity present in benchmark datasets. We present WebGraphEval, a framework that abstracts trajectories from multiple agents into a unified, weighted action graph. This representation is directly compatible with benchmarks such as WebArena, leveraging leaderboard runs and newly collected trajectories without modifying environments. The framework canonically encodes actions, merges recurring behaviors, and applies structural analyses including reward propagation and success-weighted edge statistics. Evaluations across thousands of trajectories from six web agents show that the graph abstraction captures cross-model regularities, highlights redundancy and inefficiency, and identifies critical decision points overlooked by outcome-based metrics. By framing web interaction as graph-structured data, WebGraphEval establishes a general methodology for multi-path, cross-agent, and efficiency-aware evaluation of web agents.

* 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Multi-Turn Interactions in Large Language Models

Via

Access Paper or Ask Questions

CHARM: Considering Human Attributes for Reinforcement Modeling

Jun 16, 2025

Qidi Fang, Hang Yu, Shijie Fang, Jindan Huang, Qiuyu Chen, Reuben M. Aronson, Elaine S. Short

Abstract:Reinforcement Learning from Human Feedback has recently achieved significant success in various fields, and its performance is highly related to feedback quality. While much prior work acknowledged that human teachers' characteristics would affect human feedback patterns, there is little work that has closely investigated the actual effects. In this work, we designed an exploratory study investigating how human feedback patterns are associated with human characteristics. We conducted a public space study with two long horizon tasks and 46 participants. We found that feedback patterns are not only correlated with task statistics, such as rewards, but also correlated with participants' characteristics, especially robot experience and educational background. Additionally, we demonstrated that human feedback value can be more accurately predicted with human characteristics compared to only using task statistics. All human feedback and characteristics we collected, and codes for our data collection and predicting more accurate human feedback are available at https://github.com/AABL-Lab/CHARM

* ROMAN 2025

Via

Access Paper or Ask Questions

On the Effect of Robot Errors on Human Teaching Dynamics

Sep 15, 2024

Jindan Huang, Isaac Sheidlower, Reuben M. Aronson, Elaine Schaertl Short

Abstract:Human-in-the-loop learning is gaining popularity, particularly in the field of robotics, because it leverages human knowledge about real-world tasks to facilitate agent learning. When people instruct robots, they naturally adapt their teaching behavior in response to changes in robot performance. While current research predominantly focuses on integrating human teaching dynamics from an algorithmic perspective, understanding these dynamics from a human-centered standpoint is an under-explored, yet fundamental problem. Addressing this issue will enhance both robot learning and user experience. Therefore, this paper explores one potential factor contributing to the dynamic nature of human teaching: robot errors. We conducted a user study to investigate how the presence and severity of robot errors affect three dimensions of human teaching dynamics: feedback granularity, feedback richness, and teaching time, in both forced-choice and open-ended teaching contexts. The results show that people tend to spend more time teaching robots with errors, provide more detailed feedback over specific segments of a robot's trajectory, and that robot error can influence a teacher's choice of feedback modality. Our findings offer valuable insights for designing effective interfaces for interactive learning and optimizing algorithms to better understand human intentions.

* Accepted to 2024 International Conference on Human-Agent Interaction (HAI)

Via

Access Paper or Ask Questions

Reconstructing Sinus Anatomy from Endoscopic Video -- Towards a Radiation-free Approach for Quantitative Longitudinal Assessment

Mar 18, 2020

Xingtong Liu, Maia Stiber, Jindan Huang, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath

Figure 1 for Reconstructing Sinus Anatomy from Endoscopic Video -- Towards a Radiation-free Approach for Quantitative Longitudinal Assessment

Figure 2 for Reconstructing Sinus Anatomy from Endoscopic Video -- Towards a Radiation-free Approach for Quantitative Longitudinal Assessment

Figure 3 for Reconstructing Sinus Anatomy from Endoscopic Video -- Towards a Radiation-free Approach for Quantitative Longitudinal Assessment

Figure 4 for Reconstructing Sinus Anatomy from Endoscopic Video -- Towards a Radiation-free Approach for Quantitative Longitudinal Assessment

Abstract:Reconstructing accurate 3D surface models of sinus anatomy directly from an endoscopic video is a promising avenue for cross-sectional and longitudinal analysis to better understand the relationship between sinus anatomy and surgical outcomes. We present a patient-specific, learning-based method for 3D reconstruction of sinus surface anatomy directly and only from endoscopic videos. We demonstrate the effectiveness and accuracy of our method on in and ex vivo data where we compare to sparse reconstructions from Structure from Motion, dense reconstruction from COLMAP, and ground truth anatomy from CT. Our textured reconstructions are watertight and enable measurement of clinically relevant parameters in good agreement with CT. The source code will be made publicly available upon publication.

Via

Access Paper or Ask Questions