Alert button
Picture for Adi Haviv

Adi Haviv

Alert button

OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks

Oct 05, 2023
Ofir Bar Tal, Adi Haviv, Amit H. Bermano

Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data to misguide the model into incorrect classifications. Creating these attacks is a challenging task, especially with the ever-increasing complexity of models and datasets. In this work, we introduce a self-supervised, computationally economical method for generating adversarial examples, designed for the unseen black-box setting. Adapting techniques from representation learning, our method generates on-manifold EAs that are encouraged to resemble the data distribution. These attacks are comparable in effectiveness compared to the state-of-the-art when attacking the model trained on, but are significantly more effective when attacking unseen models, as the attacks are more related to the data rather than the model itself. Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models, suggesting a significant role for on-manifold EAs when targeting unseen models.

* ICCV 2023, AROW Workshop 
Viaarxiv icon

Understanding Transformer Memorization Recall Through Idioms

Oct 11, 2022
Adi Haviv, Ido Cohen, Jacob Gidron, Roei Schuster, Yoav Goldberg, Mor Geva

Figure 1 for Understanding Transformer Memorization Recall Through Idioms
Figure 2 for Understanding Transformer Memorization Recall Through Idioms
Figure 3 for Understanding Transformer Memorization Recall Through Idioms
Figure 4 for Understanding Transformer Memorization Recall Through Idioms

To produce accurate predictions, language models (LMs) must balance between generalization and memorization. Yet, little is known about the mechanism by which transformer LMs employ their memorization capacity. When does a model decide to output a memorized phrase, and how is this phrase then retrieved from memory? In this work, we offer the first methodological framework for probing and characterizing recall of memorized sequences in transformer LMs. First, we lay out criteria for detecting model inputs that trigger memory recall, and propose idioms as inputs that fulfill these criteria. Next, we construct a dataset of English idioms and use it to compare model behavior on memorized vs. non-memorized inputs. Specifically, we analyze the internal prediction construction process by interpreting the model's hidden representations as a gradual refinement of the output probability distribution. We find that across different model sizes and architectures, memorized predictions are a two-step process: early layers promote the predicted token to the top of the output distribution, and upper layers increase model confidence. This suggests that memorized information is stored and retrieved in the early layers of the network. Last, we demonstrate the utility of our methodology beyond idioms in memorized factual statements. Overall, our work makes a first step towards understanding memory recall, and provides a methodological basis for future studies of transformer memorization.

Viaarxiv icon

Transformer Language Models without Positional Encodings Still Learn Positional Information

Mar 30, 2022
Adi Haviv, Ori Ram, Ofir Press, Peter Izsak, Omer Levy

Figure 1 for Transformer Language Models without Positional Encodings Still Learn Positional Information
Figure 2 for Transformer Language Models without Positional Encodings Still Learn Positional Information
Figure 3 for Transformer Language Models without Positional Encodings Still Learn Positional Information
Figure 4 for Transformer Language Models without Positional Encodings Still Learn Positional Information

Transformers typically require some form of positional encoding, such as positional embeddings, to process natural language sequences. Surprisingly, we find that transformer language models without any explicit positional encoding are still competitive with standard models, and that this phenomenon is robust across different datasets, model sizes, and sequence lengths. Probing experiments reveal that such models acquire an implicit notion of absolute positions throughout the network, effectively compensating for the missing information. We conjecture that causal attention enables the model to infer the number of predecessors that each token can attend to, thereby approximating its absolute position.

Viaarxiv icon

SCROLLS: Standardized CompaRison Over Long Language Sequences

Jan 10, 2022
Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy

Figure 1 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Figure 2 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Figure 3 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Figure 4 for SCROLLS: Standardized CompaRison Over Long Language Sequences

NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input. SCROLLS contains summarization, question answering, and natural language inference tasks, covering multiple domains, including literature, science, business, and entertainment. Initial baselines, including Longformer Encoder-Decoder, indicate that there is ample room for improvement on SCROLLS. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.

Viaarxiv icon

Can Latent Alignments Improve Autoregressive Machine Translation?

Apr 19, 2021
Adi Haviv, Lior Vassertail, Omer Levy

Figure 1 for Can Latent Alignments Improve Autoregressive Machine Translation?
Figure 2 for Can Latent Alignments Improve Autoregressive Machine Translation?
Figure 3 for Can Latent Alignments Improve Autoregressive Machine Translation?
Figure 4 for Can Latent Alignments Improve Autoregressive Machine Translation?

Latent alignment objectives such as CTC and AXE significantly improve non-autoregressive machine translation models. Can they improve autoregressive models as well? We explore the possibility of training autoregressive machine translation models with latent alignment objectives, and observe that, in practice, this approach results in degenerate models. We provide a theoretical explanation for these empirical results, and prove that latent alignment objectives are incompatible with teacher forcing.

* Accepted to NAACL 2021 
Viaarxiv icon

BERTese: Learning to Speak to BERT

Mar 11, 2021
Adi Haviv, Jonathan Berant, Amir Globerson

Figure 1 for BERTese: Learning to Speak to BERT
Figure 2 for BERTese: Learning to Speak to BERT
Figure 3 for BERTese: Learning to Speak to BERT
Figure 4 for BERTese: Learning to Speak to BERT

Large pre-trained language models have been shown to encode large amounts of world and commonsense knowledge in their parameters, leading to substantial interest in methods for extracting that knowledge. In past work, knowledge was extracted by taking manually-authored queries and gathering paraphrases for them using a separate pipeline. In this work, we propose a method for automatically rewriting queries into "BERTese", a paraphrase query that is directly optimized towards better knowledge extraction. To encourage meaningful rewrites, we add auxiliary loss functions that encourage the query to correspond to actual language tokens. We empirically show our approach outperforms competing baselines, obviating the need for complex pipelines. Moreover, BERTese provides some insight into the type of language that helps language models perform knowledge extraction.

* Accepted to EACL 2021 
Viaarxiv icon