Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hadar Mulian

Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis

Apr 12, 2026

Roi Ben-Gigi, Yuval David, Fabiana Fournier, Lior Limonad, Dany Moshkovich, Hadar Mulian, Segev Shlomov

Abstract:AI agent development relies heavily on natural language prompting to define agents' tasks, knowledge, and goals. These prompts are interpreted by Large Language Models (LLMs), which govern agent behavior. Consequently, agentic performance is susceptible to variability arising from imprecise or ambiguous prompt formulations. Identifying and correcting such issues requires examining not only the agent's code, but also the internal system prompts generated throughout its execution lifecycle, as reflected in execution logs. In this work, we introduce an analytics pipeline implemented as part of the Agent Mentor open-source library that monitors and incrementally adapts the system prompts defining another agent's behavior. The pipeline improves performance by systematically injecting corrective instructions into the agent's knowledge. We describe its underlying mechanism, with particular emphasis on identifying semantic features associated with undesired behaviors and using them to derive corrective statements. We evaluate the proposed pipeline across three exemplar agent configurations and benchmark tasks using repeated execution runs to assess effectiveness. These experiments provide an initial exploration of automating such a mentoring pipeline within future agentic governance frameworks. Overall, the approach demonstrates consistent and measurable accuracy improvements across diverse configurations, particularly in settings dominated by specification ambiguity. For reproducibility, we released our code as open source under the Agent Mentor library.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Selecting the Right LLM for eGov Explanations

Apr 27, 2025

Lior Limonad, Fabiana Fournier, Hadar Mulian, George Manias, Spiros Borotis, Danai Kyrkou

Abstract:The perceived quality of the explanations accompanying e-government services is key to gaining trust in these institutions, consequently amplifying further usage of these services. Recent advances in generative AI, and concretely in Large Language Models (LLMs) allow the automation of such content articulations, eliciting explanations' interpretability and fidelity, and more generally, adapting content to various audiences. However, selecting the right LLM type for this has become a non-trivial task for e-government service providers. In this work, we adapted a previously developed scale to assist with this selection, providing a systematic approach for the comparative analysis of the perceived quality of explanations generated by various LLMs. We further demonstrated its applicability through the tax-return process, using it as an exemplar use case that could benefit from employing an LLM to generate explanations about tax refund decisions. This was attained through a user study with 128 survey respondents who were asked to rate different versions of LLM-generated explanations about tax refund decisions, providing a methodological basis for selecting the most appropriate LLM. Recognizing the practical challenges of conducting such a survey, we also began exploring the automation of this process by attempting to replicate human feedback using a selection of cutting-edge predictive techniques.

* 8 pages, 7 figures. ICEDEG 2025, Bern, Switzerland, June 2025

Via

Access Paper or Ask Questions

Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems

Mar 09, 2025

Dany Moshkovich, Hadar Mulian, Sergey Zeltyn, Natti Eder, Inna Skarbovsky, Roy Abitbol

Figure 1 for Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems

Figure 2 for Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems

Figure 3 for Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems

Figure 4 for Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems

Abstract:The rise of agentic AI systems, where agents collaborate to perform diverse tasks, poses new challenges with observing, analyzing and optimizing their behavior. Traditional evaluation and benchmarking approaches struggle to handle the non-deterministic, context-sensitive, and dynamic nature of these systems. This paper explores key challenges and opportunities in analyzing and optimizing agentic systems across development, testing, and maintenance. We explore critical issues such as natural language variability and unpredictable execution flows, which hinder predictability and control, demanding adaptive strategies to manage input variability and evolving behaviors. Through our user study, we supported these hypotheses. In particular, we showed a 79% agreement that non deterministic flow of agentic systems acts as a major challenge. Finally, we validated our statements empirically advocating the need for moving beyond classical benchmarking. To bridge these gaps, we introduce taxonomies to present expected analytics outcomes and the ways to collect them by extending standard observability frameworks. Building on these foundations, we introduce and demonstrate novel approach for benchmarking of agent evaluation systems. Unlike traditional "black box" performance evaluation approaches, our benchmark is built from agent runtime logs as input, and analytics outcome including discovered flows and issues. By addressing key limitations in existing methodologies, we aim to set the stage for more advanced and holistic evaluation strategies, which could foster the development of adaptive, interpretable, and robust agentic AI systems.

* 14 pages, 19 figures

Via

Access Paper or Ask Questions

Mimicking the Maestro: Exploring the Efficacy of a Virtual AI Teacher in Fine Motor Skill Acquisition

Oct 16, 2023

Hadar Mulian, Segev Shlomov, Lior Limonad

Figure 1 for Mimicking the Maestro: Exploring the Efficacy of a Virtual AI Teacher in Fine Motor Skill Acquisition

Figure 2 for Mimicking the Maestro: Exploring the Efficacy of a Virtual AI Teacher in Fine Motor Skill Acquisition

Figure 3 for Mimicking the Maestro: Exploring the Efficacy of a Virtual AI Teacher in Fine Motor Skill Acquisition

Figure 4 for Mimicking the Maestro: Exploring the Efficacy of a Virtual AI Teacher in Fine Motor Skill Acquisition

Abstract:Motor skills, especially fine motor skills like handwriting, play an essential role in academic pursuits and everyday life. Traditional methods to teach these skills, although effective, can be time-consuming and inconsistent. With the rise of advanced technologies like robotics and artificial intelligence, there is increasing interest in automating such teaching processes using these technologies, via human-robot and human-computer interactions. In this study, we examine the potential of a virtual AI teacher in emulating the techniques of human educators for motor skill acquisition. We introduce an AI teacher model that captures the distinct characteristics of human instructors. Using a Reinforcement Learning environment tailored to mimic teacher-learner interactions, we tested our AI model against four guiding hypotheses, emphasizing improved learner performance, enhanced rate of skill acquisition, and reduced variability in learning outcomes. Our findings, validated on synthetic learners, revealed significant improvements across all tested hypotheses. Notably, our model showcased robustness across different learners and settings and demonstrated adaptability to handwriting. This research underscores the potential of integrating Reinforcement Learning and Imitation Learning models with robotics in revolutionizing the teaching of critical motor skills.

* 17 pages, 3 figures

Via

Access Paper or Ask Questions