Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anyi Wang

Improving LLM Reasoning through Interpretable Role-Playing Steering

Jun 09, 2025

Anyi Wang, Dong Shu, Yifan Wang, Yunpu Ma, Mengnan Du

Figure 1 for Improving LLM Reasoning through Interpretable Role-Playing Steering

Figure 2 for Improving LLM Reasoning through Interpretable Role-Playing Steering

Figure 3 for Improving LLM Reasoning through Interpretable Role-Playing Steering

Figure 4 for Improving LLM Reasoning through Interpretable Role-Playing Steering

Abstract:Role-playing has emerged as an effective technique for enhancing the reasoning capabilities of large language models (LLMs). However, existing methods primarily rely on prompt engineering, which often lacks stability and interpretability. In this paper, we introduce Sparse Autoencoder Role-Playing Steering (SRPS), a novel framework that identifies and manipulates internal model features associated with role-playing behavior. Our approach extracts latent representations from role-play prompts, selects the most relevant features based on activation patterns, and constructs a steering vector that can be injected into the model's residual stream with controllable intensity. Our method enables fine-grained control over role-specific behavior and offers insights into how role information influences internal model activations. Extensive experiments across various reasoning benchmarks and model sizes demonstrate consistent performance gains. Notably, in the zero-shot chain-of-thought (CoT) setting, the accuracy of Llama3.1-8B on CSQA improves from 31.86% to 39.80%, while Gemma2-9B on SVAMP increases from 37.50% to 45.10%. These results highlight the potential of SRPS to enhance reasoning ability in LLMs, providing better interpretability and stability compared to traditional prompt-based role-playing.

* 21 pages, 8 figures, 8 tables

Via

Access Paper or Ask Questions

What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns

Apr 22, 2025

Michael A. Hedderich, Anyi Wang, Raoyuan Zhao, Florian Eichin, Barbara Plank

Figure 1 for What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns

Figure 2 for What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns

Figure 3 for What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns

Figure 4 for What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns

Abstract:Prompt engineering for large language models is challenging, as even small prompt perturbations or model changes can significantly impact the generated output texts. Existing evaluation methods, either automated metrics or human evaluation, have limitations, such as providing limited insights or being labor-intensive. We propose Spotlight, a new approach that combines both automation and human analysis. Based on data mining techniques, we automatically distinguish between random (decoding) variations and systematic differences in language model outputs. This process provides token patterns that describe the systematic differences and guide the user in manually analyzing the effects of their prompt and model changes efficiently. We create three benchmarks to quantitatively test the reliability of token pattern extraction methods and demonstrate that our approach provides new insights into established prompt data. From a human-centric perspective, through demonstration studies and a user study, we show that our token pattern approach helps users understand the systematic differences of language model outputs, and we are able to discover relevant differences caused by prompt and model changes (e.g. related to gender or culture), thus supporting the prompt engineering process and human-centric model behavior research.

Via

Access Paper or Ask Questions