Picture for Ninghao Liu

Ninghao Liu

Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering

Add code
May 21, 2025
Viaarxiv icon

Artificial Intelligence Bias on English Language Learners in Automatic Scoring

Add code
May 15, 2025
Viaarxiv icon

Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders

Add code
May 12, 2025
Viaarxiv icon

Towards Trustworthy GUI Agents: A Survey

Add code
Mar 30, 2025
Viaarxiv icon

A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models

Add code
Mar 07, 2025
Viaarxiv icon

Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders

Add code
Feb 21, 2025
Viaarxiv icon

Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data

Add code
Feb 19, 2025
Viaarxiv icon

Self-Regularization with Latent Space Explanations for Controllable LLM-based Classification

Add code
Feb 19, 2025
Viaarxiv icon

SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?

Add code
Feb 18, 2025
Viaarxiv icon

EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification

Add code
Feb 07, 2025
Viaarxiv icon