Picture for Mengnan Du

Mengnan Du

AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling

Add code
Jan 13, 2026
Viaarxiv icon

NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models

Add code
Jan 07, 2026
Viaarxiv icon

Rep2Text: Decoding Full Text from a Single LLM Token Representation

Add code
Nov 09, 2025
Viaarxiv icon

KnowThyself: An Agentic Assistant for LLM Interpretability

Add code
Nov 05, 2025
Viaarxiv icon

AdaptiveK Sparse Autoencoders: Dynamic Sparsity Allocation for Interpretable LLM Representations

Add code
Aug 24, 2025
Viaarxiv icon

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

Add code
Aug 11, 2025
Viaarxiv icon

DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router

Add code
Jul 30, 2025
Viaarxiv icon

Improving LLM Reasoning through Interpretable Role-Playing Steering

Add code
Jun 09, 2025
Figure 1 for Improving LLM Reasoning through Interpretable Role-Playing Steering
Figure 2 for Improving LLM Reasoning through Interpretable Role-Playing Steering
Figure 3 for Improving LLM Reasoning through Interpretable Role-Playing Steering
Figure 4 for Improving LLM Reasoning through Interpretable Role-Playing Steering
Viaarxiv icon

Fine-Grained Interpretation of Political Opinions in Large Language Models

Add code
Jun 05, 2025
Viaarxiv icon

SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models

Add code
May 22, 2025
Viaarxiv icon