Picture for Martin Tutek

Martin Tutek

ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs

Add code
Oct 01, 2025
Viaarxiv icon

Context Parametrization with Compositional Adapters

Add code
Sep 26, 2025
Viaarxiv icon

Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings

Add code
Jun 16, 2025
Viaarxiv icon

MIB: A Mechanistic Interpretability Benchmark

Add code
Apr 17, 2025
Viaarxiv icon

Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps

Add code
Feb 20, 2025
Viaarxiv icon

REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space

Add code
Jun 13, 2024
Figure 1 for REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Figure 2 for REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Figure 3 for REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Figure 4 for REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Viaarxiv icon

Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs

Add code
Jan 18, 2024
Figure 1 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Figure 2 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Figure 3 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Figure 4 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Viaarxiv icon

Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness

Add code
Oct 04, 2023
Figure 1 for Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Figure 2 for Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Figure 3 for Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Figure 4 for Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Viaarxiv icon

CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

Add code
Sep 15, 2023
Figure 1 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
Figure 2 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
Figure 3 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
Figure 4 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
Viaarxiv icon

Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods

Add code
Nov 15, 2022
Figure 1 for Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Figure 2 for Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Figure 3 for Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Figure 4 for Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Viaarxiv icon