Picture for Shauli Ravfogel

Shauli Ravfogel

The Truthfulness Spectrum Hypothesis

Add code
Feb 23, 2026
Viaarxiv icon

Discrete Diffusion Models Exploit Asymmetry to Solve Lookahead Planning Tasks

Add code
Feb 23, 2026
Viaarxiv icon

From Directions to Regions: Decomposing Activations in Language Models via Local Geometry

Add code
Feb 02, 2026
Viaarxiv icon

State over Tokens: Characterizing the Role of Reasoning Tokens

Add code
Dec 14, 2025
Viaarxiv icon

IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs

Add code
Jul 27, 2025
Viaarxiv icon

The Medium Is Not the Message: Deconfounding Text Embeddings via Linear Concept Erasure

Add code
Jul 01, 2025
Figure 1 for The Medium Is Not the Message: Deconfounding Text Embeddings via Linear Concept Erasure
Figure 2 for The Medium Is Not the Message: Deconfounding Text Embeddings via Linear Concept Erasure
Figure 3 for The Medium Is Not the Message: Deconfounding Text Embeddings via Linear Concept Erasure
Figure 4 for The Medium Is Not the Message: Deconfounding Text Embeddings via Linear Concept Erasure
Viaarxiv icon

Preserving Task-Relevant Information Under Linear Concept Removal

Add code
Jun 12, 2025
Figure 1 for Preserving Task-Relevant Information Under Linear Concept Removal
Figure 2 for Preserving Task-Relevant Information Under Linear Concept Removal
Figure 3 for Preserving Task-Relevant Information Under Linear Concept Removal
Figure 4 for Preserving Task-Relevant Information Under Linear Concept Removal
Viaarxiv icon

RELIC: Evaluating Compositional Instruction Following via Language Recognition

Add code
Jun 05, 2025
Viaarxiv icon

Diversity Over Quantity: A Lesson From Few Shot Relation Classification

Add code
Dec 06, 2024
Viaarxiv icon

Counterfactual Generation from Language Models

Add code
Nov 11, 2024
Figure 1 for Counterfactual Generation from Language Models
Figure 2 for Counterfactual Generation from Language Models
Figure 3 for Counterfactual Generation from Language Models
Figure 4 for Counterfactual Generation from Language Models
Viaarxiv icon