Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Masafumi Enomoto

Revisiting Observation Reduction for Web Agents: Comprehensive Evaluation with a Lightweight Framework

May 28, 2026

Masafumi Enomoto, Ryoma Obara, Haochen Zhang, Masafumi Oyamada

Abstract:HTML observations in LLM-based web agents are extremely long, and while many reduction methods have been proposed, it remains unclear which methods reduce overall agent latency while maintaining performance. The main obstacle is the high cost of end-to-end evaluation: in our experiments, evaluating 11 methods across 32 configurations on 33 tasks of WorkArena L1 required 232.4 cumulative hours. To address this, we propose a lightweight evaluation framework based on the Minimal Failure Set (MFS), the minimal set of HTML elements whose removal causes task failure. We define coverage as the fraction of instances in which a reduction method fully retains the MFS, which serves as a proxy metric that requires neither web access nor LLM inference. We validate that coverage strongly correlates with end-to-end success rate, with over 100$\times$ speedup in cumulative evaluation time on both benchmarks. Using this framework, we find that extractive HTML reduction methods require either high computation cost or domain-specific optimization to reduce agent latency while maintaining performance. Building on this, we optimize a pruning program on MFS training data, achieving 2.2$\times$ faster per-step latency on WorkArena L1 while retaining 84\% of the original success rate, and 3.1$\times$ faster on WebLinx while retaining 89\%.

* 22 pages, 8 figures, 4 tables

Via

Access Paper or Ask Questions

cotomi Act: Learning to Automate Work by Watching You

May 04, 2026

Masafumi Oyamada, Kunihiro Takeoka, Kosuke Akimoto, Ryoma Obara, Masafumi Enomoto, Haochen Zhang, Daichi Haraguchi, Takuya Tamura

Abstract:What if a browser agent could learn your work simply by watching you do it? We present cotomi Act, a browser-based computer-using agent that combines reliable multi-step task execution with persistent organizational knowledge learned from user behavior. For execution, an agent scaffold with adaptive lazy observation, verbal-diff-based history compression, coarse-grained actions, and test-time scaling via best-of-N action selection achieves 80.4% on the 179-task WebArena human-evaluation subset, exceeding the reported 78.2% human baseline. For organizational knowledge, a behavior-to-knowledge pipeline passively observes the user's browsing and progressively abstracts it into artifacts (task boards, wiki) exposed through a shared workspace editable by both user and agent. A controlled proxy evaluation confirms that task success improves as behavior-derived knowledge accumulates. In our live demonstration, attendees interact with the system in a real browser, issuing tasks and observing end-to-end autonomous execution and shared knowledge management.

* 7 pages, 4 figures. ACM CAIS 2026 (System Demonstrations)

Via

Access Paper or Ask Questions

Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance

Apr 28, 2025

Takuya Tamura, Taro Yano, Masafumi Enomoto, Masafumi Oyamada

Figure 1 for Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance

Figure 2 for Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance

Figure 3 for Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance

Figure 4 for Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance

Abstract:Accurately forecasting the performance of Large Language Models (LLMs) before extensive fine-tuning or merging can substantially reduce both computational expense and development time. Although prior approaches like scaling laws account for global factors such as parameter size or training tokens, they often overlook explicit lineage relationships - i.e., which models are derived or merged from which parents. In this work, we propose a novel Lineage-Regularized Matrix Factorization (LRMF) framework that encodes ancestral ties among LLMs via a graph Laplacian regularizer. By leveraging multi-hop parent-child connections, LRMF consistently outperforms conventional matrix factorization and collaborative filtering methods in both instance-level and benchmark-level performance prediction. Our large-scale study includes 2,934 publicly available Hugging Face models and 21,000+ instances across 6 major benchmarks, showing that lineage constraints yield up to 7-10 percentage points higher correlation with actual performance compared to baselines. Moreover, LRMF effectively addresses the cold-start problem, providing accurate estimates for newly derived or merged models even with minimal data. This lineage-guided strategy thus offers a resource-efficient way to inform hyperparameter tuning, data selection, and model combination in modern LLM development.

Via

Access Paper or Ask Questions

LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization

Jun 18, 2024

Masafumi Enomoto, Kunihiro Takeoka, Kosuke Akimoto, Kiril Gashteovski, Masafumi Oyamada

Figure 1 for LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization

Figure 2 for LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization

Figure 3 for LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization

Figure 4 for LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization

Abstract:Open-Domain Multi-Document Summarization (ODMDS) is crucial for addressing diverse information needs, which aims to generate a summary as answer to user's query, synthesizing relevant content from multiple documents in a large collection. Existing approaches that first find relevant passages and then generate a summary using a language model are inadequate for ODMDS. This is because open-ended queries often require additional context for the retrieved passages to cover the topic comprehensively, making it challenging to retrieve all relevant passages initially. While iterative retrieval methods have been explored for multi-hop question answering (MQA), they are impractical for ODMDS due to high latency from repeated large language model (LLM) inference for reasoning. To address this issue, we propose LightPAL, a lightweight passage retrieval method for ODMDS that constructs a graph representing passage relationships using an LLM during indexing and employs random walk instead of iterative reasoning and retrieval at inference time. Experiments on ODMDS benchmarks show that LightPAL outperforms baseline retrievers in summary quality while being significantly more efficient than an iterative MQA approach.

* 13 pages, 3 figures

Via

Access Paper or Ask Questions

DeepJoin: Joinable Table Discovery with Pre-trained Language Models

Dec 15, 2022

Yuyang Dong, Chuan Xiao, Takuma Nozawa, Masafumi Enomoto, Masafumi Oyamada

Figure 1 for DeepJoin: Joinable Table Discovery with Pre-trained Language Models

Figure 2 for DeepJoin: Joinable Table Discovery with Pre-trained Language Models

Figure 3 for DeepJoin: Joinable Table Discovery with Pre-trained Language Models

Figure 4 for DeepJoin: Joinable Table Discovery with Pre-trained Language Models

Abstract:Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery has become an important operation in data lake management. Existing approaches target equi-joins, the most common way of combining tables for creating a unified view, or semantic joins, which tolerate misspellings and different formats to deliver more join results. They are either exact solutions whose running time is linear in the sizes of query column and target table repository or approximate solutions lacking precision. In this paper, we propose Deepjoin, a deep learning model for accurate and efficient joinable table discovery. Our solution is an embedding-based retrieval, which employs a pre-trained language model (PLM) and is designed as one framework serving both equi- and semantic joins. We propose a set of contextualization options to transform column contents to a text sequence. The PLM reads the sequence and is fine-tuned to embed columns to vectors such that columns are expected to be joinable if they are close to each other in the vector space. Since the output of the PLM is fixed in length, the subsequent search procedure becomes independent of the column size. With a state-of-the-art approximate nearest neighbor search algorithm, the search time is logarithmic in the repository size. To train the model, we devise the techniques for preparing training data as well as data augmentation. The experiments on real datasets demonstrate that by training on a small subset of a corpus, Deepjoin generalizes to large datasets and its precision consistently outperforms other approximate solutions'. Deepjoin is even more accurate than an exact solution to semantic joins when evaluated with labels from experts. Moreover, when equipped with a GPU, Deepjoin is up to two orders of magnitude faster than existing solutions.

Via

Access Paper or Ask Questions

Masafumi Enomoto

Revisiting Observation Reduction for Web Agents: Comprehensive Evaluation with a Lightweight Framework

cotomi Act: Learning to Automate Work by Watching You

Read More, Think More: Revisiting Observation Reduction for Web Agents

Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance

LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization

DeepJoin: Joinable Table Discovery with Pre-trained Language Models