Alert button
Picture for Yonatan Belinkov

Yonatan Belinkov

Alert button

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Add code
Bookmark button
Alert button
Mar 31, 2024
Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller

Figure 1 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 2 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 3 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 4 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Viaarxiv icon

Jamba: A Hybrid Transformer-Mamba Language Model

Add code
Bookmark button
Alert button
Mar 28, 2024
Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, Yoav Shoham

Figure 1 for Jamba: A Hybrid Transformer-Mamba Language Model
Figure 2 for Jamba: A Hybrid Transformer-Mamba Language Model
Figure 3 for Jamba: A Hybrid Transformer-Mamba Language Model
Figure 4 for Jamba: A Hybrid Transformer-Mamba Language Model
Viaarxiv icon

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

Add code
Bookmark button
Alert button
Mar 26, 2024
Michael Hanna, Sandro Pezzelle, Yonatan Belinkov

Viaarxiv icon

Concept-Best-Matching: Evaluating Compositionality in Emergent Communication

Add code
Bookmark button
Alert button
Mar 17, 2024
Boaz Carmeli, Yonatan Belinkov, Ron Meir

Viaarxiv icon

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Add code
Bookmark button
Alert button
Mar 14, 2024
Shadi Iskander, Kira Radinsky, Yonatan Belinkov

Figure 1 for Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
Figure 2 for Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
Figure 3 for Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
Figure 4 for Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
Viaarxiv icon

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Add code
Bookmark button
Alert button
Mar 09, 2024
Michael Toker, Hadas Orgad, Mor Ventura, Dana Arad, Yonatan Belinkov

Figure 1 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Figure 2 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Figure 3 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Figure 4 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Viaarxiv icon

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry

Add code
Bookmark button
Alert button
Feb 27, 2024
Michael Toker, Oren Mishali, Ophir Münz-Manor, Benny Kimelfeld, Yonatan Belinkov

Viaarxiv icon

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Add code
Bookmark button
Alert button
Feb 22, 2024
Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau

Viaarxiv icon

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

Add code
Bookmark button
Alert button
Feb 20, 2024
Shahar Katz, Yonatan Belinkov, Mor Geva, Lior Wolf

Viaarxiv icon

Accelerating the Global Aggregation of Local Explanations

Add code
Bookmark button
Alert button
Dec 23, 2023
Alon Mor, Yonatan Belinkov, Benny Kimelfeld

Viaarxiv icon