Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mengyin Lu

SPEAR: Code-Augmented Agentic Prompt Optimization

May 25, 2026

Mengyin Lu, Cong Feng, Huimin Han, Guangming Lu, Yu Sun, Xiaonan Ding, Shihui Long, Fengyi Li, Tanvi Motwani

Abstract:Automatic prompt engineering (APE) rewrites prompts to improve downstream task performance, but existing APE loops treat the optimizer itself as a fixed pipeline. We port the code-as-action paradigm of CodeAct (Wang et al., 2024a) to APE and propose SPEAR (Sandboxed Prompt Engineer with Active Roll-back), a free-form agentic optimizer with four tools -- evaluate, python, set_prompt, finish -- that decides autonomously how and when to use them. The distinctive tool is the Python sandbox: the optimizer writes and executes arbitrary Python on the current evaluation DataFrame, performing structural error analysis (confusion matrices, error clustering, per group metrics) the agent itself authors. Two guardrails turn the long-horizon agent into a monotone-improving optimizer: auto-rollback on metric regression, and an optional guard metric floor. We evaluate on three industrial LLM-as-judge suites (13 judge tasks across recruiter-intake, conversational-memory, and query-refinement systems) plus seven BBH tasks and GSM8K. SPEAR wins every industrial task on the primary metric ($κ$ 0.857 vs 0.359 on tool-selection; F1-macro 0.815 vs 0.763 on filter-relevance; $κ$ 0.254 vs 0.218 on the hardest extraction dimension). On BBH-7 SPEAR averages 0.938 accuracy vs GEPA 0.628 and TextGrad 0.484. Ablations show the Python tool is the largest single lever on complex judge tasks ($Δ\approx +0.79κ$ on the 5-class tool-selection judge, $Δ\approx +0.35κ$ on the hardest extraction dimension when removed); its irreplaceable contribution is class-pair confusion aggregation that a long-context LLM cannot extract reliably from the raw eval DataFrame.

* 19 pages, 3 figures, EMNLP 2026 submission

Via

Access Paper or Ask Questions

Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

Sep 13, 2023

Mohamed Elaraby, Mengyin Lu, Jacob Dunn, Xueying Zhang, Yu Wang, Shizhu Liu, Pingchuan Tian, Yuping Wang, Yuxuan Wang

Figure 1 for Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

Figure 2 for Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

Figure 3 for Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

Figure 4 for Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

Abstract:Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP). Although convenient for research and practical applications, open-source LLMs with fewer parameters often suffer from severe hallucinations compared to their larger counterparts. This paper focuses on measuring and reducing hallucinations in BLOOM 7B, a representative of such weaker open-source LLMs that are publicly available for research and commercial applications. We introduce HaloCheck, a lightweight BlackBox knowledge-free framework designed to quantify the severity of hallucinations in LLMs. Additionally, we explore techniques like knowledge injection and teacher-student approaches to alleviate hallucinations in low-parameter LLMs. Our experiments effectively demonstrate the reduction of hallucinations in challenging domains for these LLMs.

Via

Access Paper or Ask Questions

Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

Jan 07, 2022

Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Mengyin Lu, Liefeng Bo

Figure 1 for Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

Figure 2 for Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

Figure 3 for Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

Figure 4 for Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

Abstract:Many previous studies aim to augment collaborative filtering with deep neural network techniques, so as to achieve better recommendation performance. However, most existing deep learning-based recommender systems are designed for modeling singular type of user-item interaction behavior, which can hardly distill the heterogeneous relations between user and item. In practical recommendation scenarios, there exist multityped user behaviors, such as browse and purchase. Due to the overlook of user's multi-behavioral patterns over different items, existing recommendation methods are insufficient to capture heterogeneous collaborative signals from user multi-behavior data. Inspired by the strength of graph neural networks for structured data modeling, this work proposes a Graph Neural Multi-Behavior Enhanced Recommendation (GNMR) framework which explicitly models the dependencies between different types of user-item interactions under a graph-based message passing architecture. GNMR devises a relation aggregation network to model interaction heterogeneity, and recursively performs embedding propagation between neighboring nodes over the user-item interaction graph. Experiments on real-world recommendation datasets show that our GNMR consistently outperforms state-of-the-art methods. The source code is available at https://github.com/akaxlh/GNMR.

* Published on ICDE 2021

Via

Access Paper or Ask Questions

Knowledge-aware Coupled Graph Neural Network for Social Recommendation

Oct 08, 2021

Chao Huang, Huance Xu, Yong Xu, Peng Dai, Lianghao Xia, Mengyin Lu, Liefeng Bo, Hao Xing, Xiaoping Lai, Yanfang Ye

Figure 1 for Knowledge-aware Coupled Graph Neural Network for Social Recommendation

Figure 2 for Knowledge-aware Coupled Graph Neural Network for Social Recommendation

Figure 3 for Knowledge-aware Coupled Graph Neural Network for Social Recommendation

Figure 4 for Knowledge-aware Coupled Graph Neural Network for Social Recommendation

Abstract:Social recommendation task aims to predict users' preferences over items with the incorporation of social connections among users, so as to alleviate the sparse issue of collaborative filtering. While many recent efforts show the effectiveness of neural network-based social recommender systems, several important challenges have not been well addressed yet: (i) The majority of models only consider users' social connections, while ignoring the inter-dependent knowledge across items; (ii) Most of existing solutions are designed for singular type of user-item interactions, making them infeasible to capture the interaction heterogeneity; (iii) The dynamic nature of user-item interactions has been less explored in many social-aware recommendation techniques. To tackle the above challenges, this work proposes a Knowledge-aware Coupled Graph Neural Network (KCGN) that jointly injects the inter-dependent knowledge across items and users into the recommendation framework. KCGN enables the high-order user- and item-wise relation encoding by exploiting the mutual information for global graph structure awareness. Additionally, we further augment KCGN with the capability of capturing dynamic multi-typed user-item interactive patterns. Experimental studies on real-world datasets show the effectiveness of our method against many strong baselines in a variety of settings. Source codes are available at: https://github.com/xhcdream/KCGN.

* Published as a paper at AAAI 2021

Via

Access Paper or Ask Questions