Picture for Maheep Chaudhary

Maheep Chaudhary

In-Context Environments Induce Evaluation-Awareness in Language Models

Add code
Mar 04, 2026
Viaarxiv icon

Weight space Detection of Backdoors in LoRA Adapters

Add code
Feb 16, 2026
Viaarxiv icon

Broken Chains: The Cost of Incomplete Reasoning in LLMs

Add code
Feb 16, 2026
Viaarxiv icon

SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought

Add code
Nov 11, 2025
Viaarxiv icon

Alignment-Constrained Dynamic Pruning for LLMs: Identifying and Preserving Alignment-Critical Circuits

Add code
Nov 09, 2025
Viaarxiv icon

Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis

Add code
Nov 09, 2025
Figure 1 for Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis
Figure 2 for Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis
Figure 3 for Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis
Figure 4 for Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis
Viaarxiv icon

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors

Add code
May 20, 2025
Viaarxiv icon

Modular Training of Neural Networks aids Interpretability

Add code
Feb 04, 2025
Figure 1 for Modular Training of Neural Networks aids Interpretability
Figure 2 for Modular Training of Neural Networks aids Interpretability
Figure 3 for Modular Training of Neural Networks aids Interpretability
Figure 4 for Modular Training of Neural Networks aids Interpretability
Viaarxiv icon

Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small

Add code
Sep 05, 2024
Figure 1 for Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
Figure 2 for Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
Figure 3 for Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
Viaarxiv icon

Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives

Add code
Jul 31, 2023
Viaarxiv icon