Picture for Maheep Chaudhary

Maheep Chaudhary

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors

Add code
May 20, 2025
Viaarxiv icon

Modular Training of Neural Networks aids Interpretability

Add code
Feb 04, 2025
Figure 1 for Modular Training of Neural Networks aids Interpretability
Figure 2 for Modular Training of Neural Networks aids Interpretability
Figure 3 for Modular Training of Neural Networks aids Interpretability
Figure 4 for Modular Training of Neural Networks aids Interpretability
Viaarxiv icon

Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small

Add code
Sep 05, 2024
Viaarxiv icon

Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives

Add code
Jul 31, 2023
Viaarxiv icon

An Intelligent Recommendation-cum-Reminder System

Add code
Aug 09, 2021
Figure 1 for An Intelligent Recommendation-cum-Reminder System
Figure 2 for An Intelligent Recommendation-cum-Reminder System
Figure 3 for An Intelligent Recommendation-cum-Reminder System
Figure 4 for An Intelligent Recommendation-cum-Reminder System
Viaarxiv icon