Picture for Mohammad Beigi

Mohammad Beigi

IR$^3$: Contrastive Inverse Reinforcement Learning for Interpretable Detection and Mitigation of Reward Hacking

Add code
Feb 23, 2026
Viaarxiv icon

Adversarial Reward Auditing for Active Detection and Mitigation of Reward Hacking

Add code
Feb 02, 2026
Viaarxiv icon

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models

Add code
Feb 22, 2025
Figure 1 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 2 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 3 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Figure 4 for A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models
Viaarxiv icon

Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models

Add code
Oct 26, 2024
Figure 1 for Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models
Figure 2 for Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models
Figure 3 for Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models
Viaarxiv icon

InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States

Add code
Jun 17, 2024
Figure 1 for InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States
Figure 2 for InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States
Figure 3 for InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States
Figure 4 for InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States
Viaarxiv icon

Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models

Add code
Feb 16, 2024
Figure 1 for Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
Figure 2 for Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
Figure 3 for Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
Figure 4 for Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
Viaarxiv icon