Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shou-Tzu Han

Fragile Reasoning: A Mechanistic Analysis of LLM Sensitivity to Meaning-Preserving Perturbations

Apr 02, 2026

Shou-Tzu Han, Rodrigue Rizk, KC Santosh

Abstract:Large language models demonstrate strong performance on mathematical reasoning benchmarks, yet remain surprisingly fragile to meaning-preserving surface perturbations. We systematically evaluate three open-weight LLMs, Mistral-7B, Llama-3-8B, and Qwen2.5-7B, on 677 GSM8K problems paired with semantically equivalent variants generated through name substitution and number format paraphrasing. All three models exhibit substantial answer-flip rates (28.8%-45.1%), with number paraphrasing consistently more disruptive than name swaps. To trace the mechanistic basis of these failures, we introduce the Mechanistic Perturbation Diagnostics (MPD) framework, combining logit lens analysis, activation patching, component ablation, and the Cascading Amplification Index (CAI) into a unified diagnostic pipeline. CAI, a novel metric quantifying layer-wise divergence amplification, outperforms first divergence layer as a failure predictor for two of three architectures (AUC up to 0.679). Logit lens reveals that flipped samples diverge from correct predictions at significantly earlier layers than stable samples. Activation patching reveals a stark architectural divide in failure localizability: Llama-3 failures are recoverable by patching at specific layers (43/60 samples), while Mistral and Qwen failures are broadly distributed (3/60 and 0/60). Based on these diagnostic signals, we propose a mechanistic failure taxonomy (localized, distributed, and entangled) and validate it through targeted repair experiments: steering vectors and layer fine-tuning recover 12.2% of localized failures (Llama-3) but only 7.2% of entangled (Qwen) and 5.2% of distributed (Mistral) failures.

* Preprint. Under review at COLM 2026

Via

Access Paper or Ask Questions

Narrative-Centered Emotional Reflection: Scaffolding Autonomous Emotional Literacy with AI

Apr 29, 2025

Shou-Tzu Han

Figure 1 for Narrative-Centered Emotional Reflection: Scaffolding Autonomous Emotional Literacy with AI

Figure 2 for Narrative-Centered Emotional Reflection: Scaffolding Autonomous Emotional Literacy with AI

Figure 3 for Narrative-Centered Emotional Reflection: Scaffolding Autonomous Emotional Literacy with AI

Figure 4 for Narrative-Centered Emotional Reflection: Scaffolding Autonomous Emotional Literacy with AI

Abstract:Reflexion is an AI-powered platform designed to enable structured emotional self-reflection at scale. By integrating real-time emotion detection, layered reflective prompting, and metaphorical storytelling generation, Reflexion empowers users to engage in autonomous emotional exploration beyond basic sentiment categorization. Grounded in theories of expressive writing, cognitive restructuring, self-determination, and critical consciousness development, the system scaffolds a progressive journey from surface-level emotional recognition toward value-aligned action planning. Initial pilot studies with diverse participants demonstrate positive outcomes in emotional articulation, cognitive reframing, and perceived psychological resilience. Reflexion represents a promising direction for scalable, theory-informed affective computing interventions aimed at fostering emotional literacy and psychological growth across educational, therapeutic, and public health contexts.

* 10 pages, 5 figures, preliminary results, early-stage work intended for future conference submission

Via

Access Paper or Ask Questions