Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lior Wolf

Obtaining Favorable Layouts for Multiple Object Generation

May 01, 2024

Barak Battash, Amit Rozner, Lior Wolf, Ofir Lindenbaum

Figure 1 for Obtaining Favorable Layouts for Multiple Object Generation

Figure 2 for Obtaining Favorable Layouts for Multiple Object Generation

Figure 3 for Obtaining Favorable Layouts for Multiple Object Generation

Figure 4 for Obtaining Favorable Layouts for Multiple Object Generation

Abstract:Large-scale text-to-image models that can generate high-quality and diverse images based on textual prompts have shown remarkable success. These models aim ultimately to create complex scenes, and addressing the challenge of multi-subject generation is a critical step towards this goal. However, the existing state-of-the-art diffusion models face difficulty when generating images that involve multiple subjects. When presented with a prompt containing more than one subject, these models may omit some subjects or merge them together. To address this challenge, we propose a novel approach based on a guiding principle. We allow the diffusion model to initially propose a layout, and then we rearrange the layout grid. This is achieved by enforcing cross-attention maps (XAMs) to adhere to proposed masks and by migrating pixels from latent maps to new locations determined by us. We introduce new loss terms aimed at reducing XAM entropy for clearer spatial definition of subjects, reduce the overlap between XAMs, and ensure that XAMs align with their respective masks. We contrast our approach with several alternative methods and show that it more faithfully captures the desired concepts across a variety of text prompts.

Via

Access Paper or Ask Questions

The Hidden Attention of Mamba Models

Mar 03, 2024

Ameen Ali, Itamar Zimerman, Lior Wolf

Figure 1 for The Hidden Attention of Mamba Models

Figure 2 for The Hidden Attention of Mamba Models

Figure 3 for The Hidden Attention of Mamba Models

Figure 4 for The Hidden Attention of Mamba Models

Abstract:The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains including NLP, long-range sequences processing, and computer vision. Selective SSMs are viewed as dual models, in which one trains in parallel on the entire sequence via IO-aware parallel scan, and deploys in an autoregressive manner. We add a third view and show that such models can be viewed as attention-driven models. This new perspective enables us to compare the underlying mechanisms to that of the self-attention layers in transformers and allows us to peer inside the inner workings of the Mamba model with explainability methods. Our code is publicly available.

Via

Access Paper or Ask Questions

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

Feb 20, 2024

Shahar Katz, Yonatan Belinkov, Mor Geva, Lior Wolf

Figure 1 for Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

Figure 2 for Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

Figure 3 for Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

Figure 4 for Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

Abstract:Understanding how Transformer-based Language Models (LMs) learn and recall information is a key goal of the deep learning community. Recent interpretability methods project weights and hidden states obtained from the forward pass to the models' vocabularies, helping to uncover how information flows within LMs. In this work, we extend this methodology to LMs' backward pass and gradients. We first prove that a gradient matrix can be cast as a low-rank linear combination of its forward and backward passes' inputs. We then develop methods to project these gradients into vocabulary items and explore the mechanics of how new information is stored in the LMs' neurons.

Via

Access Paper or Ask Questions

Training-Free Consistent Text-to-Image Generation

Feb 05, 2024

Yoad Tewel, Omri Kaduri, Rinon Gal, Yoni Kasten, Lior Wolf, Gal Chechik, Yuval Atzmon

Abstract:Text-to-image models offer a new level of creative flexibility by allowing users to guide the image generation process through natural language. However, using these models to consistently portray the same subject across diverse prompts remains challenging. Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects or add image conditioning to the model. These methods require lengthy per-subject optimization or large-scale pre-training. Moreover, they struggle to align generated images with text prompts and face difficulties in portraying multiple subjects. Here, we present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model. We introduce a subject-driven shared attention block and correspondence-based feature injection to promote subject consistency between images. Additionally, we develop strategies to encourage layout diversity while maintaining subject consistency. We compare ConsiStory to a range of baselines, and demonstrate state-of-the-art performance on subject consistency and text alignment, without requiring a single optimization step. Finally, ConsiStory can naturally extend to multi-subject scenarios, and even enable training-free personalization for common objects.

* Project page is in https://consistory-paper.github.io

Via

Access Paper or Ask Questions

DiffMoog: a Differentiable Modular Synthesizer for Sound Matching

Jan 23, 2024

Noy Uzrad, Oren Barkan, Almog Elharar, Shlomi Shvartzman, Moshe Laufer, Lior Wolf, Noam Koenigstein

Abstract:This paper presents DiffMoog - a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments. Being differentiable, it allows integration into neural networks, enabling automated sound matching, to replicate a given audio input. Notably, DiffMoog facilitates modulation capabilities (FM/AM), low-frequency oscillators (LFOs), filters, envelope shapers, and the ability for users to create custom signal chains. We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework. This framework utilizes a novel signal-chain loss and an encoder network that self-programs its outputs to predict DiffMoogs parameters based on the user-defined modular architecture. Moreover, we provide insights and lessons learned towards sound matching using differentiable synthesis. Combining robust sound capabilities with a holistic platform, DiffMoog stands as a premier asset for expediting research in audio synthesis and machine learning.

* 5 pages, 7 figures, 1 table, Our code is released at https://github.com/aisynth/diffmoog

Via

Access Paper or Ask Questions

Dynamic Layer Tying for Parameter-Efficient Transformers

Jan 23, 2024

Tamir David Hay, Lior Wolf

Abstract:In the pursuit of reducing the number of trainable parameters in deep transformer networks, we employ Reinforcement Learning to dynamically select layers during training and tie them together. Every few iterations, the RL agent is asked whether to train each layer $i$ independently or to copy the weights of a previous layer $j<i$. This facilitates weight sharing, reduces the number of trainable parameters, and also serves as an effective regularization technique. Experimental evaluations validate that our model modestly outperforms the baseline transformer model with regard to perplexity and drastically reduces the number of trainable parameters. In particular, the memory consumption during training is up to one order of magnitude less than the conventional training method.

Via

Access Paper or Ask Questions

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

Jan 20, 2024

Nadav Benedek, Lior Wolf

Abstract:With the proliferation of large pre-trained language models (PLMs), fine-tuning all model parameters becomes increasingly inefficient, particularly when dealing with numerous downstream tasks that entail substantial training and storage costs. Several approaches aimed at achieving parameter-efficient fine-tuning (PEFT) have been proposed. Among them, Low-Rank Adaptation (LoRA) stands out as an archetypal method, incorporating trainable rank decomposition matrices into each target module. Nevertheless, LoRA does not consider the varying importance of each layer. To address these challenges, we introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process, considering both the temporary magnitude of weights and the accumulated statistics of the input to any given layer. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.

* EACL 2024

Via

Access Paper or Ask Questions

Efficient Verification-Based Face Identification

Dec 20, 2023

Amit Rozner, Barak Battash, Ofir Lindenbaum, Lior Wolf

Figure 1 for Efficient Verification-Based Face Identification

Figure 2 for Efficient Verification-Based Face Identification

Figure 3 for Efficient Verification-Based Face Identification

Figure 4 for Efficient Verification-Based Face Identification

Abstract:We study the problem of performing face verification with an efficient neural model $f$. The efficiency of $f$ stems from simplifying the face verification problem from an embedding nearest neighbor search into a binary problem; each user has its own neural network $f$. To allow information sharing between different individuals in the training set, we do not train $f$ directly but instead generate the model weights using a hypernetwork $h$. This leads to the generation of a compact personalized model for face identification that can be deployed on edge devices. Key to the method's success is a novel way of generating hard negatives and carefully scheduling the training objectives. Our model leads to a substantially small $f$ requiring only 23k parameters and 5M floating point operations (FLOPS). We use six face verification datasets to demonstrate that our method is on par or better than state-of-the-art models, with a significantly reduced number of parameters and computational burden. Furthermore, we perform an extensive ablation study to demonstrate the importance of each element in our method.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Degree-based stratification of nodes in Graph Neural Networks

Dec 16, 2023

Ameen Ali, Hakan Cevikalp, Lior Wolf

Abstract:Despite much research, Graph Neural Networks (GNNs) still do not display the favorable scaling properties of other deep neural networks such as Convolutional Neural Networks and Transformers. Previous work has identified issues such as oversmoothing of the latent representation and have suggested solutions such as skip connections and sophisticated normalization schemes. Here, we propose a different approach that is based on a stratification of the graph nodes. We provide motivation that the nodes in a graph can be stratified into those with a low degree and those with a high degree and that the two groups are likely to behave differently. Based on this motivation, we modify the Graph Neural Network (GNN) architecture so that the weight matrices are learned, separately, for the nodes in each group. This simple-to-implement modification seems to improve performance across datasets and GNN methods. To verify that this increase in performance is not only due to the added capacity, we also perform the same modification for random splits of the nodes, which does not lead to any improvement.

Via

Access Paper or Ask Questions

On the Long Range Abilities of Transformers

Nov 28, 2023

Itamar Zimerman, Lior Wolf

Figure 1 for On the Long Range Abilities of Transformers

Figure 2 for On the Long Range Abilities of Transformers

Figure 3 for On the Long Range Abilities of Transformers

Figure 4 for On the Long Range Abilities of Transformers

Abstract:Despite their dominance in modern DL and, especially, NLP domains, transformer architectures exhibit sub-optimal performance on long-range tasks compared to recent layers that are specifically designed for this purpose. In this work, drawing inspiration from key attributes of long-range layers, such as state-space layers, linear RNN layers, and global convolution layers, we demonstrate that minimal modifications to the transformer architecture can significantly enhance performance on the Long Range Arena (LRA) benchmark, thus narrowing the gap with these specialized layers. We identify that two key principles for long-range tasks are (i) incorporating an inductive bias towards smoothness, and (ii) locality. As we show, integrating these ideas into the attention mechanism improves results with a negligible amount of additional computation and without any additional trainable parameters. Our theory and experiments also shed light on the reasons for the inferior performance of transformers on long-range tasks and identify critical properties that are essential for successfully capturing long-range dependencies.

* 18 pages

Via

Access Paper or Ask Questions