Picture for Martin Tutek

Martin Tutek

Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings

Add code
Jun 16, 2025
Viaarxiv icon

MIB: A Mechanistic Interpretability Benchmark

Add code
Apr 17, 2025
Viaarxiv icon

Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps

Add code
Feb 20, 2025
Viaarxiv icon

REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space

Add code
Jun 13, 2024
Figure 1 for REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Figure 2 for REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Figure 3 for REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Figure 4 for REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Viaarxiv icon

Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs

Add code
Jan 18, 2024
Figure 1 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Figure 2 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Figure 3 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Figure 4 for Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Viaarxiv icon

Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness

Add code
Oct 04, 2023
Figure 1 for Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Figure 2 for Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Figure 3 for Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Figure 4 for Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Viaarxiv icon

CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

Add code
Sep 15, 2023
Figure 1 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
Figure 2 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
Figure 3 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
Figure 4 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
Viaarxiv icon

Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods

Add code
Nov 15, 2022
Figure 1 for Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Figure 2 for Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Figure 3 for Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Figure 4 for Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Viaarxiv icon

Staying True to Your Word: Can Attention Become Explanation?

Add code
May 19, 2020
Figure 1 for Staying True to Your Word:  Can Attention Become Explanation?
Figure 2 for Staying True to Your Word:  Can Attention Become Explanation?
Figure 3 for Staying True to Your Word:  Can Attention Become Explanation?
Figure 4 for Staying True to Your Word:  Can Attention Become Explanation?
Viaarxiv icon

Iterative Recursive Attention Model for Interpretable Sequence Classification

Add code
Aug 30, 2018
Figure 1 for Iterative Recursive Attention Model for Interpretable Sequence Classification
Figure 2 for Iterative Recursive Attention Model for Interpretable Sequence Classification
Figure 3 for Iterative Recursive Attention Model for Interpretable Sequence Classification
Figure 4 for Iterative Recursive Attention Model for Interpretable Sequence Classification
Viaarxiv icon