Picture for Mor Geva

Mor Geva

Shammie

How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?

Add code
Jun 12, 2025
Viaarxiv icon

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Add code
Jun 12, 2025
Viaarxiv icon

Precise In-Parameter Concept Erasure in Large Language Models

Add code
May 28, 2025
Viaarxiv icon

Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas

Add code
Mar 04, 2025
Viaarxiv icon

Preventing Rogue Agents Improves Multi-Agent Collaboration

Add code
Feb 09, 2025
Figure 1 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Figure 2 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Figure 3 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Figure 4 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Add code
Jan 14, 2025
Viaarxiv icon

Open Problems in Machine Unlearning for AI Safety

Add code
Jan 09, 2025
Viaarxiv icon

Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models

Add code
Dec 18, 2024
Figure 1 for Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Figure 2 for Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Figure 3 for Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Figure 4 for Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Viaarxiv icon

Inferring Functionality of Attention Heads from their Parameters

Add code
Dec 16, 2024
Viaarxiv icon