Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ryan Chen

LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models

May 17, 2025

Ryan Chen, Youngmin Ko, Zeyu Zhang, Catherine Cho, Sunny Chung, Mauro Giuffré, Dennis L. Shung, Bradly C. Stadie

Abstract:We introduce \textbf{LAMP} (\textbf{L}inear \textbf{A}ttribution \textbf{M}apping \textbf{P}robe), a method that shines light onto a black-box language model's decision surface and studies how reliably a model maps its stated reasons to its predictions through a locally linear model approximating the decision surface. LAMP treats the model's own self-reported explanations as a coordinate system and fits a locally linear surrogate that links those weights to the model's output. By doing so, it reveals which stated factors steer the model's decisions, and by how much. We apply LAMP to three tasks: \textit{sentiment analysis}, \textit{controversial-topic detection}, and \textit{safety-prompt auditing}. Across these tasks, LAMP reveals that many LLMs exhibit locally linear decision landscapes. In addition, these surfaces correlate with human judgments on explanation quality and, on a clinical case-file data set, aligns with expert assessments. Since LAMP operates without requiring access to model gradients, logits, or internal activations, it serves as a practical and lightweight framework for auditing proprietary language models, and enabling assessment of whether a model behaves consistently with the explanations it provides.

Via

Access Paper or Ask Questions

Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions

Jul 02, 2024

Xiang Li, Haoran Tang, Siyu Chen, Ziwei Wang, Ryan Chen, Marcin Abram

Figure 1 for Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions

Figure 2 for Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions

Figure 3 for Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions

Figure 4 for Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions

Abstract:We measure the performance of in-context learning as a function of task novelty and difficulty for open and closed questions. For that purpose, we created a novel benchmark consisting of hard scientific questions, each paired with a context of various relevancy. We show that counter-intuitively, a context that is more aligned with the topic does not always help more than a less relevant context. This effect is especially visible for open questions and questions of high difficulty or novelty. This result reveals a fundamental difference between the treatment of close-form and open-form questions by large-language models and shows a need for a more robust evaluation of in-context learning on the variety of different types of questions. It also poses a new question of how to optimally select a context for large language models, especially in the context of Retrieval Augmented Generation (RAG) systems. Our results suggest that the answer to this question can be highly application-dependent and might be contingent on factors including the format of the question, the perceived difficulty level of the questions, and the novelty or popularity of the information we seek.

* 8 pages plus references, 4 main figures, 6 pages of supplementary material

Via

Access Paper or Ask Questions

Learning the Market: Sentiment-Based Ensemble Trading Agents

Feb 02, 2024

Andrew Ye, James Xu, Yi Wang, Yifan Yu, Daniel Yan, Ryan Chen, Bosheng Dong, Vipin Chaudhary, Shuai Xu

Figure 1 for Learning the Market: Sentiment-Based Ensemble Trading Agents

Figure 2 for Learning the Market: Sentiment-Based Ensemble Trading Agents

Figure 3 for Learning the Market: Sentiment-Based Ensemble Trading Agents

Figure 4 for Learning the Market: Sentiment-Based Ensemble Trading Agents

Abstract:We propose the integration of sentiment analysis and deep-reinforcement learning ensemble algorithms for stock trading, and design a strategy capable of dynamically altering its employed agent given concurrent market sentiment. In particular, we create a simple-yet-effective method for extracting news sentiment and combine this with general improvements upon existing works, resulting in automated trading agents that effectively consider both qualitative market factors and quantitative stock data. We show that our approach results in a strategy that is profitable, robust, and risk-minimal -- outperforming the traditional ensemble strategy as well as single agent algorithms and market metrics. Our findings determine that the conventional practice of switching ensemble agents every fixed-number of months is sub-optimal, and that a dynamic sentiment-based framework greatly unlocks additional performance within these agents. Furthermore, as we have designed our algorithm with simplicity and efficiency in mind, we hypothesize that the transition of our method from historical evaluation towards real-time trading with live data should be relatively simple.

Via

Access Paper or Ask Questions