Picture for Mengnan Du

Mengnan Du

AdaptiveK Sparse Autoencoders: Dynamic Sparsity Allocation for Interpretable LLM Representations

Add code
Aug 24, 2025
Viaarxiv icon

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

Add code
Aug 11, 2025
Viaarxiv icon

DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router

Add code
Jul 30, 2025
Viaarxiv icon

Improving LLM Reasoning through Interpretable Role-Playing Steering

Add code
Jun 09, 2025
Viaarxiv icon

Fine-Grained Interpretation of Political Opinions in Large Language Models

Add code
Jun 05, 2025
Viaarxiv icon

SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models

Add code
May 22, 2025
Viaarxiv icon

Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering

Add code
May 21, 2025
Viaarxiv icon

Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models

Add code
May 21, 2025
Viaarxiv icon

SAE-FiRE: Enhancing Earnings Surprise Predictions Through Sparse Autoencoder Feature Selection

Add code
May 20, 2025
Viaarxiv icon

Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders

Add code
May 12, 2025
Viaarxiv icon